Overview

Dataset statistics

Number of variables41
Number of observations1901539
Missing cells18885322
Missing cells (%)24.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.3 GiB
Average record size in memory1.3 KiB

Variable types

DateTime1
Unsupported2
Text12
Numeric13
Categorical13

Alerts

latitude is highly overall correlated with BoroNameHigh correlation
longitude is highly overall correlated with BoroNameHigh correlation
number_of_persons_injured is highly overall correlated with number_of_motorist_injured and 1 other fieldsHigh correlation
number_of_persons_killed is highly overall correlated with number_of_pedestrians_killed and 4 other fieldsHigh correlation
number_of_pedestrians_injured is highly overall correlated with contributing_factor_vehicle_4High correlation
number_of_pedestrians_killed is highly overall correlated with number_of_persons_killed and 4 other fieldsHigh correlation
number_of_motorist_injured is highly overall correlated with number_of_persons_injured and 1 other fieldsHigh correlation
number_of_motorist_killed is highly overall correlated with number_of_persons_killed and 1 other fieldsHigh correlation
collision_id is highly overall correlated with crash_yearHigh correlation
crash_year is highly overall correlated with collision_idHigh correlation
total_injured is highly overall correlated with number_of_persons_injured and 1 other fieldsHigh correlation
total_killed is highly overall correlated with number_of_persons_killed and 4 other fieldsHigh correlation
number_of_cyclist_killed is highly overall correlated with number_of_persons_killed and 5 other fieldsHigh correlation
contributing_factor_vehicle_3 is highly overall correlated with number_of_cyclist_killed and 2 other fieldsHigh correlation
contributing_factor_vehicle_4 is highly overall correlated with number_of_pedestrians_injured and 3 other fieldsHigh correlation
contributing_factor_vehicle_5 is highly overall correlated with number_of_pedestrians_killed and 3 other fieldsHigh correlation
crash_month is highly overall correlated with holiday_nameHigh correlation
holiday_name is highly overall correlated with crash_month and 1 other fieldsHigh correlation
is_public_holiday is highly overall correlated with holiday_nameHigh correlation
BoroName is highly overall correlated with latitude and 1 other fieldsHigh correlation
severity is highly overall correlated with number_of_persons_killed and 2 other fieldsHigh correlation
number_of_cyclist_injured is highly imbalanced (91.7%)Imbalance
number_of_cyclist_killed is highly imbalanced (99.9%)Imbalance
is_public_holiday is highly imbalanced (83.7%)Imbalance
Number_of_involved_Vehicles is highly imbalanced (51.6%)Imbalance
zip_code has 455978 (24.0%) missing valuesMissing
on_street_name has 404419 (21.3%) missing valuesMissing
cross_street_name has 714151 (37.6%) missing valuesMissing
off_street_name has 1558366 (82.0%) missing valuesMissing
contributing_factor_vehicle_1 has 642751 (33.8%) missing valuesMissing
contributing_factor_vehicle_2 has 1650130 (86.8%) missing valuesMissing
contributing_factor_vehicle_3 has 1892805 (99.5%) missing valuesMissing
contributing_factor_vehicle_4 has 1899878 (99.9%) missing valuesMissing
contributing_factor_vehicle_5 has 1901079 (> 99.9%) missing valuesMissing
vehicle_type_code_2 has 375446 (19.7%) missing valuesMissing
vehicle_type_code_3 has 1770054 (93.1%) missing valuesMissing
vehicle_type_code_4 has 1871192 (98.4%) missing valuesMissing
vehicle_type_code_5 has 1893051 (99.6%) missing valuesMissing
holiday_name has 1856022 (97.6%) missing valuesMissing
number_of_persons_killed is highly skewed (γ1 = 33.86868759)Skewed
number_of_pedestrians_killed is highly skewed (γ1 = 42.47340165)Skewed
number_of_motorist_killed is highly skewed (γ1 = 54.45987653)Skewed
total_killed is highly skewed (γ1 = 34.1523306)Skewed
collision_id has unique valuesUnique
crash_time is an unsupported type, check if it needs cleaning or further analysisUnsupported
geometry is an unsupported type, check if it needs cleaning or further analysisUnsupported
number_of_persons_injured has 1451297 (76.3%) zerosZeros
number_of_persons_killed has 1898797 (99.9%) zerosZeros
number_of_pedestrians_injured has 1797087 (94.5%) zerosZeros
number_of_pedestrians_killed has 1900129 (99.9%) zerosZeros
number_of_motorist_injured has 1616538 (85.0%) zerosZeros
number_of_motorist_killed has 1900485 (99.9%) zerosZeros
crash_hour has 62782 (3.3%) zerosZeros
total_injured has 1451188 (76.3%) zerosZeros
total_killed has 1898794 (99.9%) zerosZeros

Reproduction

Analysis started2025-04-25 23:00:00.468255
Analysis finished2025-04-25 23:06:56.779568
Duration6 minutes and 56.31 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct4672
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size29.0 MiB
Minimum2012-07-01 00:00:00
Maximum2025-04-15 00:00:00
2025-04-26T02:06:56.869583image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:56.986274image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

crash_time
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size101.6 MiB

zip_code
Text

MISSING 

Distinct233
Distinct (%)< 0.1%
Missing455978
Missing (%)24.0%
Memory size113.9 MiB
2025-04-26T02:06:57.224481image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters7227805
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row11230
2nd row11208
3rd row10475
4th row11207
5th row10017
ValueCountFrequency (%)
11207 28482
 
2.0%
11236 19822
 
1.4%
11101 19396
 
1.3%
11203 18806
 
1.3%
11234 18286
 
1.3%
11385 18061
 
1.2%
11208 17665
 
1.2%
11212 17471
 
1.2%
11226 17305
 
1.2%
11201 17264
 
1.2%
Other values (222) 1252972
86.7%
2025-04-26T02:06:57.551793image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 2807279
38.8%
0 1274113
17.6%
2 850683
 
11.8%
3 633215
 
8.8%
4 516483
 
7.1%
6 322852
 
4.5%
5 284288
 
3.9%
7 246661
 
3.4%
8 152087
 
2.1%
9 139989
 
1.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 7227650
> 99.9%
Space Separator 155
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 2807279
38.8%
0 1274113
17.6%
2 850683
 
11.8%
3 633215
 
8.8%
4 516483
 
7.1%
6 322852
 
4.5%
5 284288
 
3.9%
7 246661
 
3.4%
8 152087
 
2.1%
9 139989
 
1.9%
Space Separator
ValueCountFrequency (%)
155
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 7227805
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 2807279
38.8%
0 1274113
17.6%
2 850683
 
11.8%
3 633215
 
8.8%
4 516483
 
7.1%
6 322852
 
4.5%
5 284288
 
3.9%
7 246661
 
3.4%
8 152087
 
2.1%
9 139989
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7227805
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 2807279
38.8%
0 1274113
17.6%
2 850683
 
11.8%
3 633215
 
8.8%
4 516483
 
7.1%
6 322852
 
4.5%
5 284288
 
3.9%
7 246661
 
3.4%
8 152087
 
2.1%
9 139989
 
1.9%

latitude
Real number (ℝ)

HIGH CORRELATION 

Distinct128281
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.723911
Minimum40.498947
Maximum40.912884
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size29.0 MiB
2025-04-26T02:06:57.677886image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum40.498947
5-th percentile40.597683
Q140.667915
median40.720673
Q340.769653
95-th percentile40.862152
Maximum40.912884
Range0.413937
Interquartile range (IQR)0.1017376

Descriptive statistics

Standard deviation0.079187767
Coefficient of variation (CV)0.001944503
Kurtosis-0.55754253
Mean40.723911
Median Absolute Deviation (MAD)0.0512144
Skewness0.11440902
Sum77438106
Variance0.0062707024
MonotonicityNot monotonic
2025-04-26T02:06:57.790606image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40.861862 914
 
< 0.1%
40.696033 801
 
< 0.1%
40.759308 631
 
< 0.1%
40.8047 597
 
< 0.1%
40.675735 589
 
< 0.1%
40.6960346 587
 
< 0.1%
40.658577 547
 
< 0.1%
40.75898 500
 
< 0.1%
40.69168 491
 
< 0.1%
40.7606005 474
 
< 0.1%
Other values (128271) 1895408
99.7%
ValueCountFrequency (%)
40.498947 1
< 0.1%
40.4989488 2
< 0.1%
40.4991346 1
< 0.1%
40.49931 1
< 0.1%
40.4994787 1
< 0.1%
40.499659 1
< 0.1%
40.499672 1
< 0.1%
40.49971 1
< 0.1%
40.49984 1
< 0.1%
40.499842 2
< 0.1%
ValueCountFrequency (%)
40.912884 13
< 0.1%
40.9128276 1
 
< 0.1%
40.912827 2
 
< 0.1%
40.912647 1
 
< 0.1%
40.91257 1
 
< 0.1%
40.912537 2
 
< 0.1%
40.9124681 24
< 0.1%
40.912468 18
< 0.1%
40.912292 1
 
< 0.1%
40.9122231 4
 
< 0.1%

longitude
Real number (ℝ)

HIGH CORRELATION 

Distinct99584
Distinct (%)5.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-73.920058
Minimum-74.25496
Maximum-73.70055
Zeros0
Zeros (%)0.0%
Negative1901539
Negative (%)100.0%
Memory size29.0 MiB
2025-04-26T02:06:57.914038image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum-74.25496
5-th percentile-74.03552
Q1-73.974754
median-73.9271
Q3-73.86719
95-th percentile-73.765072
Maximum-73.70055
Range0.55441
Interquartile range (IQR)0.107564

Descriptive statistics

Standard deviation0.086177891
Coefficient of variation (CV)-0.0011658255
Kurtosis0.8578712
Mean-73.920058
Median Absolute Deviation (MAD)0.05244
Skewness-0.20288511
Sum-1.4056187 × 108
Variance0.007426629
MonotonicityNot monotonic
2025-04-26T02:06:58.077322image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-73.89063 782
 
< 0.1%
-73.98453 717
 
< 0.1%
-73.91282 717
 
< 0.1%
-73.89686 678
 
< 0.1%
-73.91243 654
 
< 0.1%
-73.94476 624
 
< 0.1%
-73.9112 594
 
< 0.1%
-73.9845292 587
 
< 0.1%
-73.882744 552
 
< 0.1%
-73.91727 543
 
< 0.1%
Other values (99574) 1895091
99.7%
ValueCountFrequency (%)
-74.25496 1
 
< 0.1%
-74.254845 1
 
< 0.1%
-74.2545316 1
 
< 0.1%
-74.25393 2
< 0.1%
-74.253174 1
 
< 0.1%
-74.2530308 1
 
< 0.1%
-74.253006 2
< 0.1%
-74.2529994 2
< 0.1%
-74.252884 1
 
< 0.1%
-74.2528764 3
< 0.1%
ValueCountFrequency (%)
-73.70055 2
 
< 0.1%
-73.700584 11
< 0.1%
-73.7005968 10
< 0.1%
-73.7006 1
 
< 0.1%
-73.70061 5
< 0.1%
-73.70071 4
 
< 0.1%
-73.70073 1
 
< 0.1%
-73.70074 1
 
< 0.1%
-73.70076 2
 
< 0.1%
-73.7007673 1
 
< 0.1%
Distinct311956
Distinct (%)16.4%
Missing0
Missing (%)0.0%
Memory size159.2 MiB
2025-04-26T02:06:58.420971image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length25
Median length24
Mean length22.77039
Min length16

Characters and Unicode

Total characters43298784
Distinct characters16
Distinct categories6 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique177073 ?
Unique (%)9.3%

Sample

1st row(40.62179, -73.970024)
2nd row(40.667202, -73.8665)
3rd row(40.709183, -73.956825)
4th row(40.86816, -73.83148)
5th row(40.67172, -73.8971)
ValueCountFrequency (%)
40.861862 914
 
< 0.1%
40.696033 801
 
< 0.1%
73.89063 782
 
< 0.1%
73.91282 717
 
< 0.1%
73.98453 717
 
< 0.1%
73.89686 678
 
< 0.1%
73.91243 654
 
< 0.1%
40.759308 631
 
< 0.1%
73.94476 624
 
< 0.1%
40.8047 597
 
< 0.1%
Other values (227855) 3795963
99.8%
2025-04-26T02:06:58.859307image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7 4744864
11.0%
4 4113725
 
9.5%
. 3803078
 
8.8%
3 3614874
 
8.3%
0 3492887
 
8.1%
9 2782758
 
6.4%
8 2734348
 
6.3%
6 2705540
 
6.2%
5 2161947
 
5.0%
( 1901539
 
4.4%
Other values (6) 11243224
26.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 29988011
69.3%
Other Punctuation 5704617
 
13.2%
Open Punctuation 1901539
 
4.4%
Space Separator 1901539
 
4.4%
Dash Punctuation 1901539
 
4.4%
Close Punctuation 1901539
 
4.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
7 4744864
15.8%
4 4113725
13.7%
3 3614874
12.1%
0 3492887
11.6%
9 2782758
9.3%
8 2734348
9.1%
6 2705540
9.0%
5 2161947
7.2%
2 1836428
 
6.1%
1 1800640
 
6.0%
Other Punctuation
ValueCountFrequency (%)
. 3803078
66.7%
, 1901539
33.3%
Open Punctuation
ValueCountFrequency (%)
( 1901539
100.0%
Space Separator
ValueCountFrequency (%)
1901539
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1901539
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1901539
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 43298784
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
7 4744864
11.0%
4 4113725
 
9.5%
. 3803078
 
8.8%
3 3614874
 
8.3%
0 3492887
 
8.1%
9 2782758
 
6.4%
8 2734348
 
6.3%
6 2705540
 
6.2%
5 2161947
 
5.0%
( 1901539
 
4.4%
Other values (6) 11243224
26.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 43298784
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7 4744864
11.0%
4 4113725
 
9.5%
. 3803078
 
8.8%
3 3614874
 
8.3%
0 3492887
 
8.1%
9 2782758
 
6.4%
8 2734348
 
6.3%
6 2705540
 
6.2%
5 2161947
 
5.0%
( 1901539
 
4.4%
Other values (6) 11243224
26.0%

on_street_name
Text

MISSING 

Distinct15502
Distinct (%)1.0%
Missing404419
Missing (%)21.3%
Memory size149.5 MiB
2025-04-26T02:06:59.072146image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length32
Median length32
Mean length28.918011
Min length4

Characters and Unicode

Total characters43293733
Distinct characters71
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3205 ?
Unique (%)0.2%

Sample

1st rowOCEAN PARKWAY
2nd rowBROOKLYN QUEENS EXPRESSWAY
3rd row3 AVENUE
4th rowMYRTLE AVENUE
5th rowSPRINGFIELD BOULEVARD
ValueCountFrequency (%)
avenue 565008
 
16.5%
street 492131
 
14.3%
east 144988
 
4.2%
boulevard 109706
 
3.2%
west 108140
 
3.2%
parkway 63394
 
1.8%
road 58543
 
1.7%
expressway 51508
 
1.5%
island 28031
 
0.8%
queens 22604
 
0.7%
Other values (4400) 1787256
52.1%
2025-04-26T02:06:59.406965image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
24568140
56.7%
E 3363038
 
7.8%
A 1750796
 
4.0%
T 1707359
 
3.9%
R 1461187
 
3.4%
S 1291668
 
3.0%
N 1290048
 
3.0%
U 882479
 
2.0%
V 784172
 
1.8%
O 751465
 
1.7%
Other values (61) 5443381
 
12.6%

Most occurring categories

ValueCountFrequency (%)
Space Separator 24568140
56.7%
Uppercase Letter 17545098
40.5%
Decimal Number 1123290
 
2.6%
Lowercase Letter 50286
 
0.1%
Open Punctuation 2467
 
< 0.1%
Close Punctuation 2466
 
< 0.1%
Other Punctuation 1924
 
< 0.1%
Dash Punctuation 62
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 3363038
19.2%
A 1750796
10.0%
T 1707359
9.7%
R 1461187
 
8.3%
S 1291668
 
7.4%
N 1290048
 
7.4%
U 882479
 
5.0%
V 784172
 
4.5%
O 751465
 
4.3%
L 573758
 
3.3%
Other values (16) 3689128
21.0%
Lowercase Letter
ValueCountFrequency (%)
n 5225
 
10.4%
e 4801
 
9.5%
r 4454
 
8.9%
y 4124
 
8.2%
a 3493
 
6.9%
o 3355
 
6.7%
l 2893
 
5.8%
s 2857
 
5.7%
k 2631
 
5.2%
t 2291
 
4.6%
Other values (16) 14162
28.2%
Decimal Number
ValueCountFrequency (%)
1 254436
22.7%
3 127110
11.3%
2 124958
11.1%
4 106502
9.5%
5 104186
9.3%
6 91682
 
8.2%
8 84212
 
7.5%
7 83740
 
7.5%
9 74775
 
6.7%
0 71689
 
6.4%
Other Punctuation
ValueCountFrequency (%)
. 1510
78.5%
/ 383
 
19.9%
' 26
 
1.4%
& 4
 
0.2%
# 1
 
0.1%
Space Separator
ValueCountFrequency (%)
24568140
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2467
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2466
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 62
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 25698349
59.4%
Latin 17595384
40.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 3363038
19.1%
A 1750796
10.0%
T 1707359
9.7%
R 1461187
 
8.3%
S 1291668
 
7.3%
N 1290048
 
7.3%
U 882479
 
5.0%
V 784172
 
4.5%
O 751465
 
4.3%
L 573758
 
3.3%
Other values (42) 3739414
21.3%
Common
ValueCountFrequency (%)
24568140
95.6%
1 254436
 
1.0%
3 127110
 
0.5%
2 124958
 
0.5%
4 106502
 
0.4%
5 104186
 
0.4%
6 91682
 
0.4%
8 84212
 
0.3%
7 83740
 
0.3%
9 74775
 
0.3%
Other values (9) 78608
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 43293733
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
24568140
56.7%
E 3363038
 
7.8%
A 1750796
 
4.0%
T 1707359
 
3.9%
R 1461187
 
3.4%
S 1291668
 
3.0%
N 1290048
 
3.0%
U 882479
 
2.0%
V 784172
 
1.8%
O 751465
 
1.7%
Other values (61) 5443381
 
12.6%

cross_street_name
Text

MISSING 

Distinct17646
Distinct (%)1.5%
Missing714151
Missing (%)37.6%
Memory size125.9 MiB
2025-04-26T02:06:59.586956image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length32
Median length31
Mean length22.162937
Min length3

Characters and Unicode

Total characters26316005
Distinct characters66
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3115 ?
Unique (%)0.3%

Sample

1st rowAVENUE K
2nd rowEAST 43 STREET
3rd rowEAST GATE PLAZA
4th row150 STREET
5th rowHEATH AVENUE
ValueCountFrequency (%)
avenue 526600
20.1%
street 427312
 
16.3%
east 105100
 
4.0%
west 65102
 
2.5%
boulevard 58025
 
2.2%
road 47113
 
1.8%
place 31235
 
1.2%
3 18046
 
0.7%
parkway 17055
 
0.7%
broadway 16373
 
0.6%
Other values (4663) 1303925
49.8%
2025-04-26T02:06:59.849989image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
12596230
47.9%
E 2703052
 
10.3%
T 1352338
 
5.1%
A 1277342
 
4.9%
R 1015168
 
3.9%
N 981532
 
3.7%
S 897882
 
3.4%
U 714689
 
2.7%
V 658531
 
2.5%
O 511547
 
1.9%
Other values (56) 3607694
 
13.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 12703853
48.3%
Space Separator 12596230
47.9%
Decimal Number 1014388
 
3.9%
Lowercase Letter 1470
 
< 0.1%
Other Punctuation 53
 
< 0.1%
Dash Punctuation 11
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 2703052
21.3%
T 1352338
10.6%
A 1277342
10.1%
R 1015168
 
8.0%
N 981532
 
7.7%
S 897882
 
7.1%
U 714689
 
5.6%
V 658531
 
5.2%
O 511547
 
4.0%
L 389902
 
3.1%
Other values (16) 2201870
17.3%
Lowercase Letter
ValueCountFrequency (%)
e 279
19.0%
t 174
11.8%
r 129
8.8%
a 129
8.8%
n 98
 
6.7%
s 97
 
6.6%
v 75
 
5.1%
o 73
 
5.0%
l 66
 
4.5%
d 60
 
4.1%
Other values (14) 290
19.7%
Decimal Number
ValueCountFrequency (%)
1 224868
22.2%
2 119158
11.7%
3 111500
11.0%
5 91639
9.0%
4 91280
9.0%
7 80936
 
8.0%
8 80374
 
7.9%
6 79821
 
7.9%
9 69731
 
6.9%
0 65081
 
6.4%
Other Punctuation
ValueCountFrequency (%)
' 43
81.1%
& 5
 
9.4%
/ 4
 
7.5%
. 1
 
1.9%
Space Separator
ValueCountFrequency (%)
12596230
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 13610682
51.7%
Latin 12705323
48.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 2703052
21.3%
T 1352338
10.6%
A 1277342
10.1%
R 1015168
 
8.0%
N 981532
 
7.7%
S 897882
 
7.1%
U 714689
 
5.6%
V 658531
 
5.2%
O 511547
 
4.0%
L 389902
 
3.1%
Other values (40) 2203340
17.3%
Common
ValueCountFrequency (%)
12596230
92.5%
1 224868
 
1.7%
2 119158
 
0.9%
3 111500
 
0.8%
5 91639
 
0.7%
4 91280
 
0.7%
7 80936
 
0.6%
8 80374
 
0.6%
6 79821
 
0.6%
9 69731
 
0.5%
Other values (6) 65145
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 26316005
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
12596230
47.9%
E 2703052
 
10.3%
T 1352338
 
5.1%
A 1277342
 
4.9%
R 1015168
 
3.9%
N 981532
 
3.7%
S 897882
 
3.4%
U 714689
 
2.7%
V 658531
 
2.5%
O 511547
 
1.9%
Other values (56) 3607694
 
13.7%

off_street_name
Text

MISSING 

Distinct225364
Distinct (%)65.7%
Missing1558366
Missing (%)82.0%
Memory size92.1 MiB
2025-04-26T02:07:00.104784image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length40
Median length40
Mean length34.838064
Min length8

Characters and Unicode

Total characters11955483
Distinct characters72
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique175768 ?
Unique (%)51.2%

Sample

1st row1211 LORING AVENUE
2nd row344 BAYCHESTER AVENUE
3rd row2047 PITKIN AVENUE
4th row480 DEAN STREET
5th row878 FLATBUSH AVENUE
ValueCountFrequency (%)
avenue 136217
 
12.2%
street 126509
 
11.4%
east 32883
 
3.0%
west 23418
 
2.1%
boulevard 20986
 
1.9%
road 15257
 
1.4%
place 6516
 
0.6%
parkway 6500
 
0.6%
broadway 5345
 
0.5%
ave 4987
 
0.4%
Other values (25183) 734301
66.0%
2025-04-26T02:07:00.472985image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6526437
54.6%
E 784389
 
6.6%
T 426899
 
3.6%
A 388840
 
3.3%
R 320149
 
2.7%
N 283655
 
2.4%
S 283088
 
2.4%
1 277555
 
2.3%
U 198786
 
1.7%
2 188384
 
1.6%
Other values (62) 2277301
 
19.0%

Most occurring categories

ValueCountFrequency (%)
Space Separator 6526437
54.6%
Uppercase Letter 3894873
32.6%
Decimal Number 1449113
 
12.1%
Dash Punctuation 80307
 
0.7%
Other Punctuation 3068
 
< 0.1%
Lowercase Letter 1043
 
< 0.1%
Open Punctuation 322
 
< 0.1%
Close Punctuation 319
 
< 0.1%
Control 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 784389
20.1%
T 426899
11.0%
A 388840
10.0%
R 320149
8.2%
N 283655
 
7.3%
S 283088
 
7.3%
U 198786
 
5.1%
V 182842
 
4.7%
O 167178
 
4.3%
L 126822
 
3.3%
Other values (16) 732225
18.8%
Lowercase Letter
ValueCountFrequency (%)
e 197
18.9%
t 134
12.8%
v 85
8.1%
n 82
 
7.9%
r 78
 
7.5%
a 68
 
6.5%
o 58
 
5.6%
s 50
 
4.8%
d 43
 
4.1%
h 42
 
4.0%
Other values (14) 206
19.8%
Decimal Number
ValueCountFrequency (%)
1 277555
19.2%
2 188384
13.0%
0 160265
11.1%
3 148908
10.3%
5 145623
10.0%
4 130773
9.0%
6 105677
 
7.3%
7 103035
 
7.1%
8 98380
 
6.8%
9 90513
 
6.2%
Other Punctuation
ValueCountFrequency (%)
/ 2675
87.2%
& 240
 
7.8%
. 120
 
3.9%
@ 18
 
0.6%
' 11
 
0.4%
* 3
 
0.1%
: 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
6526437
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 80307
100.0%
Open Punctuation
ValueCountFrequency (%)
( 322
100.0%
Close Punctuation
ValueCountFrequency (%)
) 319
100.0%
Control
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 8059567
67.4%
Latin 3895916
32.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 784389
20.1%
T 426899
11.0%
A 388840
10.0%
R 320149
8.2%
N 283655
 
7.3%
S 283088
 
7.3%
U 198786
 
5.1%
V 182842
 
4.7%
O 167178
 
4.3%
L 126822
 
3.3%
Other values (40) 733268
18.8%
Common
ValueCountFrequency (%)
6526437
81.0%
1 277555
 
3.4%
2 188384
 
2.3%
0 160265
 
2.0%
3 148908
 
1.8%
5 145623
 
1.8%
4 130773
 
1.6%
6 105677
 
1.3%
7 103035
 
1.3%
8 98380
 
1.2%
Other values (12) 174530
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11955483
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6526437
54.6%
E 784389
 
6.6%
T 426899
 
3.6%
A 388840
 
3.3%
R 320149
 
2.7%
N 283655
 
2.4%
S 283088
 
2.4%
1 277555
 
2.3%
U 198786
 
1.7%
2 188384
 
1.6%
Other values (62) 2277301
 
19.0%

number_of_persons_injured
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct30
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.32015068
Minimum0
Maximum43
Zeros1451297
Zeros (%)76.3%
Negative0
Negative (%)0.0%
Memory size30.8 MiB
2025-04-26T02:07:00.571302image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum43
Range43
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.70602942
Coefficient of variation (CV)2.2053035
Kurtosis44.513387
Mean0.32015068
Median Absolute Deviation (MAD)0
Skewness4.0766332
Sum608779
Variance0.49847754
MonotonicityNot monotonic
2025-04-26T02:07:00.674291image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=30)
ValueCountFrequency (%)
0 1451297
76.3%
1 350081
 
18.4%
2 65567
 
3.4%
3 21411
 
1.1%
4 7862
 
0.4%
5 2991
 
0.2%
6 1232
 
0.1%
7 514
 
< 0.1%
8 234
 
< 0.1%
9 111
 
< 0.1%
Other values (20) 239
 
< 0.1%
ValueCountFrequency (%)
0 1451297
76.3%
1 350081
 
18.4%
2 65567
 
3.4%
3 21411
 
1.1%
4 7862
 
0.4%
5 2991
 
0.2%
6 1232
 
0.1%
7 514
 
< 0.1%
8 234
 
< 0.1%
9 111
 
< 0.1%
ValueCountFrequency (%)
43 1
 
< 0.1%
34 1
 
< 0.1%
32 1
 
< 0.1%
27 1
 
< 0.1%
25 1
 
< 0.1%
24 3
< 0.1%
23 1
 
< 0.1%
22 3
< 0.1%
21 1
 
< 0.1%
20 2
< 0.1%

number_of_persons_killed
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0015050967
Minimum0
Maximum8
Zeros1898797
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size30.8 MiB
2025-04-26T02:07:00.751630image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.041033381
Coefficient of variation (CV)27.262954
Kurtosis1955.7404
Mean0.0015050967
Median Absolute Deviation (MAD)0
Skewness33.868688
Sum2862
Variance0.0016837383
MonotonicityNot monotonic
2025-04-26T02:07:00.846099image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 1898797
99.9%
1 2652
 
0.1%
2 71
 
< 0.1%
3 13
 
< 0.1%
4 4
 
< 0.1%
8 1
 
< 0.1%
5 1
 
< 0.1%
ValueCountFrequency (%)
0 1898797
99.9%
1 2652
 
0.1%
2 71
 
< 0.1%
3 13
 
< 0.1%
4 4
 
< 0.1%
5 1
 
< 0.1%
8 1
 
< 0.1%
ValueCountFrequency (%)
8 1
 
< 0.1%
5 1
 
< 0.1%
4 4
 
< 0.1%
3 13
 
< 0.1%
2 71
 
< 0.1%
1 2652
 
0.1%
0 1898797
99.9%

number_of_pedestrians_injured
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.05736038
Minimum0
Maximum27
Zeros1797087
Zeros (%)94.5%
Negative0
Negative (%)0.0%
Memory size29.0 MiB
2025-04-26T02:07:00.941537image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum27
Range27
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.24614218
Coefficient of variation (CV)4.2911532
Kurtosis134.62238
Mean0.05736038
Median Absolute Deviation (MAD)0
Skewness5.735457
Sum109073
Variance0.060585974
MonotonicityNot monotonic
2025-04-26T02:07:01.045982image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
0 1797087
94.5%
1 100529
 
5.3%
2 3463
 
0.2%
3 356
 
< 0.1%
4 58
 
< 0.1%
5 22
 
< 0.1%
6 11
 
< 0.1%
7 6
 
< 0.1%
9 2
 
< 0.1%
19 1
 
< 0.1%
Other values (4) 4
 
< 0.1%
ValueCountFrequency (%)
0 1797087
94.5%
1 100529
 
5.3%
2 3463
 
0.2%
3 356
 
< 0.1%
4 58
 
< 0.1%
5 22
 
< 0.1%
6 11
 
< 0.1%
7 6
 
< 0.1%
8 1
 
< 0.1%
9 2
 
< 0.1%
ValueCountFrequency (%)
27 1
 
< 0.1%
19 1
 
< 0.1%
15 1
 
< 0.1%
13 1
 
< 0.1%
9 2
 
< 0.1%
8 1
 
< 0.1%
7 6
 
< 0.1%
6 11
 
< 0.1%
5 22
 
< 0.1%
4 58
< 0.1%

number_of_pedestrians_killed
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.00075307422
Minimum0
Maximum6
Zeros1900129
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size29.0 MiB
2025-04-26T02:07:01.120323image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.028113548
Coefficient of variation (CV)37.331709
Kurtosis2703.1956
Mean0.00075307422
Median Absolute Deviation (MAD)0
Skewness42.473402
Sum1432
Variance0.00079037158
MonotonicityNot monotonic
2025-04-26T02:07:01.217988image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 1900129
99.9%
1 1395
 
0.1%
2 12
 
< 0.1%
4 1
 
< 0.1%
6 1
 
< 0.1%
3 1
 
< 0.1%
ValueCountFrequency (%)
0 1900129
99.9%
1 1395
 
0.1%
2 12
 
< 0.1%
3 1
 
< 0.1%
4 1
 
< 0.1%
6 1
 
< 0.1%
ValueCountFrequency (%)
6 1
 
< 0.1%
4 1
 
< 0.1%
3 1
 
< 0.1%
2 12
 
< 0.1%
1 1395
 
0.1%
0 1900129
99.9%

number_of_cyclist_injured
Categorical

IMBALANCE 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size119.7 MiB
0
1845855 
1
 
55075
2
 
588
3
 
20
4
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1901539
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 1845855
97.1%
1 55075
 
2.9%
2 588
 
< 0.1%
3 20
 
< 0.1%
4 1
 
< 0.1%

Length

2025-04-26T02:07:01.308489image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T02:07:01.455352image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 1845855
97.1%
1 55075
 
2.9%
2 588
 
< 0.1%
3 20
 
< 0.1%
4 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 1845855
97.1%
1 55075
 
2.9%
2 588
 
< 0.1%
3 20
 
< 0.1%
4 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1901539
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1845855
97.1%
1 55075
 
2.9%
2 588
 
< 0.1%
3 20
 
< 0.1%
4 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 1901539
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1845855
97.1%
1 55075
 
2.9%
2 588
 
< 0.1%
3 20
 
< 0.1%
4 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1901539
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1845855
97.1%
1 55075
 
2.9%
2 588
 
< 0.1%
3 20
 
< 0.1%
4 1
 
< 0.1%

number_of_cyclist_killed
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size119.7 MiB
0
1901306 
1
 
232
2
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1901539
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 1901306
> 99.9%
1 232
 
< 0.1%
2 1
 
< 0.1%

Length

2025-04-26T02:07:01.545712image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T02:07:01.637671image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 1901306
> 99.9%
1 232
 
< 0.1%
2 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 1901306
> 99.9%
1 232
 
< 0.1%
2 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1901539
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1901306
> 99.9%
1 232
 
< 0.1%
2 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 1901539
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1901306
> 99.9%
1 232
 
< 0.1%
2 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1901539
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1901306
> 99.9%
1 232
 
< 0.1%
2 1
 
< 0.1%

number_of_motorist_injured
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct29
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.22875944
Minimum0
Maximum43
Zeros1616538
Zeros (%)85.0%
Negative0
Negative (%)0.0%
Memory size29.0 MiB
2025-04-26T02:07:01.717714image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum43
Range43
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.66666816
Coefficient of variation (CV)2.914276
Kurtosis55.395951
Mean0.22875944
Median Absolute Deviation (MAD)0
Skewness4.9336135
Sum434995
Variance0.44444644
MonotonicityNot monotonic
2025-04-26T02:07:01.816257image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
0 1616538
85.0%
1 191733
 
10.1%
2 59649
 
3.1%
3 20742
 
1.1%
4 7692
 
0.4%
5 2945
 
0.2%
6 1189
 
0.1%
7 488
 
< 0.1%
8 228
 
< 0.1%
9 106
 
< 0.1%
Other values (19) 229
 
< 0.1%
ValueCountFrequency (%)
0 1616538
85.0%
1 191733
 
10.1%
2 59649
 
3.1%
3 20742
 
1.1%
4 7692
 
0.4%
5 2945
 
0.2%
6 1189
 
0.1%
7 488
 
< 0.1%
8 228
 
< 0.1%
9 106
 
< 0.1%
ValueCountFrequency (%)
43 1
 
< 0.1%
34 1
 
< 0.1%
30 1
 
< 0.1%
25 1
 
< 0.1%
24 3
< 0.1%
23 1
 
< 0.1%
22 2
< 0.1%
21 1
 
< 0.1%
20 2
< 0.1%
19 2
< 0.1%

number_of_motorist_killed
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.00060161795
Minimum0
Maximum5
Zeros1900485
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size29.0 MiB
2025-04-26T02:07:01.905669image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.026854409
Coefficient of variation (CV)44.63698
Kurtosis4024.7538
Mean0.00060161795
Median Absolute Deviation (MAD)0
Skewness54.459877
Sum1144
Variance0.00072115927
MonotonicityNot monotonic
2025-04-26T02:07:01.988904image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 1900485
99.9%
1 983
 
0.1%
2 56
 
< 0.1%
3 12
 
< 0.1%
4 2
 
< 0.1%
5 1
 
< 0.1%
ValueCountFrequency (%)
0 1900485
99.9%
1 983
 
0.1%
2 56
 
< 0.1%
3 12
 
< 0.1%
4 2
 
< 0.1%
5 1
 
< 0.1%
ValueCountFrequency (%)
5 1
 
< 0.1%
4 2
 
< 0.1%
3 12
 
< 0.1%
2 56
 
< 0.1%
1 983
 
0.1%
0 1900485
99.9%
Distinct55
Distinct (%)< 0.1%
Missing642751
Missing (%)33.8%
Memory size131.4 MiB
2025-04-26T02:07:02.129557image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length53
Median length36
Mean length24.03544
Min length5

Characters and Unicode

Total characters30255523
Distinct characters30
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowpassing too closely
2nd rowdriver inexperience
3rd rowpassing too closely
4th rowpassing or lane usage improper
5th rowturning improperly
ValueCountFrequency (%)
driver 421819
 
13.3%
inattention/distraction 391028
 
12.3%
too 147831
 
4.7%
closely 147831
 
4.7%
to 138461
 
4.4%
failure 121917
 
3.9%
yield 116233
 
3.7%
right-of-way 116233
 
3.7%
passing 105963
 
3.3%
following 96910
 
3.1%
Other values (92) 1362237
43.0%
2025-04-26T02:07:02.371236image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 3466096
11.5%
t 2849704
 
9.4%
n 2611557
 
8.6%
e 2523261
 
8.3%
r 2371749
 
7.8%
o 2305062
 
7.6%
1907675
 
6.3%
a 1896181
 
6.3%
s 1340975
 
4.4%
d 1308800
 
4.3%
Other values (20) 7674463
25.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 27619565
91.3%
Space Separator 1907675
 
6.3%
Other Punctuation 490159
 
1.6%
Dash Punctuation 234044
 
0.8%
Open Punctuation 2040
 
< 0.1%
Close Punctuation 2040
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 3466096
12.5%
t 2849704
10.3%
n 2611557
9.5%
e 2523261
9.1%
r 2371749
8.6%
o 2305062
8.3%
a 1896181
 
6.9%
s 1340975
 
4.9%
d 1308800
 
4.7%
l 1265587
 
4.6%
Other values (15) 5680593
20.6%
Space Separator
ValueCountFrequency (%)
1907675
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 490159
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 234044
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2040
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2040
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 27619565
91.3%
Common 2635958
 
8.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 3466096
12.5%
t 2849704
10.3%
n 2611557
9.5%
e 2523261
9.1%
r 2371749
8.6%
o 2305062
8.3%
a 1896181
 
6.9%
s 1340975
 
4.9%
d 1308800
 
4.7%
l 1265587
 
4.6%
Other values (15) 5680593
20.6%
Common
ValueCountFrequency (%)
1907675
72.4%
/ 490159
 
18.6%
- 234044
 
8.9%
( 2040
 
0.1%
) 2040
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 30255523
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 3466096
11.5%
t 2849704
 
9.4%
n 2611557
 
8.6%
e 2523261
 
8.3%
r 2371749
 
7.8%
o 2305062
 
7.6%
1907675
 
6.3%
a 1896181
 
6.3%
s 1340975
 
4.4%
d 1308800
 
4.3%
Other values (20) 7674463
25.4%
Distinct55
Distinct (%)< 0.1%
Missing1650130
Missing (%)86.8%
Memory size84.3 MiB
2025-04-26T02:07:02.510144image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length53
Median length43
Mean length24.092757
Min length5

Characters and Unicode

Total characters6057136
Distinct characters30
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowother vehicular
2nd rowdriver inattention/distraction
3rd rowdriver inattention/distraction
4th rowpassing or lane usage improper
5th rowdriver inattention/distraction
ValueCountFrequency (%)
driver 93346
 
15.2%
inattention/distraction 87133
 
14.2%
other 30279
 
4.9%
vehicular 29369
 
4.8%
too 24745
 
4.0%
closely 24745
 
4.0%
passing 20071
 
3.3%
to 19525
 
3.2%
lane 18105
 
3.0%
following 16480
 
2.7%
Other values (92) 249686
40.7%
2025-04-26T02:07:02.754732image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 715076
11.8%
t 609053
10.1%
n 526981
 
8.7%
r 519061
 
8.6%
e 513871
 
8.5%
o 456824
 
7.5%
a 374929
 
6.2%
362075
 
6.0%
d 272409
 
4.5%
s 264969
 
4.4%
Other values (20) 1441888
23.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5553271
91.7%
Space Separator 362075
 
6.0%
Other Punctuation 109409
 
1.8%
Dash Punctuation 31865
 
0.5%
Open Punctuation 258
 
< 0.1%
Close Punctuation 258
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 715076
12.9%
t 609053
11.0%
n 526981
9.5%
r 519061
9.3%
e 513871
9.3%
o 456824
8.2%
a 374929
 
6.8%
d 272409
 
4.9%
s 264969
 
4.8%
c 221124
 
4.0%
Other values (15) 1078974
19.4%
Space Separator
ValueCountFrequency (%)
362075
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 109409
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 31865
100.0%
Open Punctuation
ValueCountFrequency (%)
( 258
100.0%
Close Punctuation
ValueCountFrequency (%)
) 258
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5553271
91.7%
Common 503865
 
8.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 715076
12.9%
t 609053
11.0%
n 526981
9.5%
r 519061
9.3%
e 513871
9.3%
o 456824
8.2%
a 374929
 
6.8%
d 272409
 
4.9%
s 264969
 
4.8%
c 221124
 
4.0%
Other values (15) 1078974
19.4%
Common
ValueCountFrequency (%)
362075
71.9%
/ 109409
 
21.7%
- 31865
 
6.3%
( 258
 
0.1%
) 258
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6057136
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 715076
11.8%
t 609053
10.1%
n 526981
 
8.7%
r 519061
 
8.6%
e 513871
 
8.5%
o 456824
 
7.5%
a 374929
 
6.2%
362075
 
6.0%
d 272409
 
4.5%
s 264969
 
4.4%
Other values (20) 1441888
23.8%

contributing_factor_vehicle_3
Categorical

HIGH CORRELATION  MISSING 

Distinct47
Distinct (%)0.5%
Missing1892805
Missing (%)99.5%
Memory size72.9 MiB
other vehicular
2625 
driver inattention/distraction
1713 
following too closely
1597 
fatigued/drowsy
624 
pavement slippery
327 
Other values (42)
1848 

Length

Max length53
Median length43
Mean length20.67449
Min length5

Characters and Unicode

Total characters180571
Distinct characters30
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowpassing or lane usage improper
2nd rowfollowing too closely
3rd rowfollowing too closely
4th rowother vehicular
5th rowother vehicular

Common Values

ValueCountFrequency (%)
other vehicular 2625
 
0.1%
driver inattention/distraction 1713
 
0.1%
following too closely 1597
 
0.1%
fatigued/drowsy 624
 
< 0.1%
pavement slippery 327
 
< 0.1%
reaction to uninvolved vehicle 186
 
< 0.1%
unsafe speed 160
 
< 0.1%
driver inexperience 156
 
< 0.1%
outside car distraction 136
 
< 0.1%
failure to yield right-of-way 130
 
< 0.1%
Other values (37) 1080
 
0.1%
(Missing) 1892805
99.5%

Length

2025-04-26T02:07:02.880545image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
other 2656
13.4%
vehicular 2625
13.2%
driver 1869
 
9.4%
inattention/distraction 1713
 
8.6%
too 1646
 
8.3%
closely 1646
 
8.3%
following 1597
 
8.0%
fatigued/drowsy 624
 
3.1%
to 343
 
1.7%
pavement 342
 
1.7%
Other values (77) 4833
24.3%

Most occurring characters

ValueCountFrequency (%)
i 17510
 
9.7%
o 17367
 
9.6%
e 16793
 
9.3%
t 15958
 
8.8%
r 14439
 
8.0%
n 11581
 
6.4%
l 11261
 
6.2%
11160
 
6.2%
a 9461
 
5.2%
c 7818
 
4.3%
Other values (20) 47223
26.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 166502
92.2%
Space Separator 11160
 
6.2%
Other Punctuation 2609
 
1.4%
Dash Punctuation 276
 
0.2%
Open Punctuation 12
 
< 0.1%
Close Punctuation 12
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 17510
10.5%
o 17367
10.4%
e 16793
10.1%
t 15958
9.6%
r 14439
 
8.7%
n 11581
 
7.0%
l 11261
 
6.8%
a 9461
 
5.7%
c 7818
 
4.7%
d 6443
 
3.9%
Other values (15) 37871
22.7%
Space Separator
ValueCountFrequency (%)
11160
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 2609
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 276
100.0%
Open Punctuation
ValueCountFrequency (%)
( 12
100.0%
Close Punctuation
ValueCountFrequency (%)
) 12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 166502
92.2%
Common 14069
 
7.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 17510
10.5%
o 17367
10.4%
e 16793
10.1%
t 15958
9.6%
r 14439
 
8.7%
n 11581
 
7.0%
l 11261
 
6.8%
a 9461
 
5.7%
c 7818
 
4.7%
d 6443
 
3.9%
Other values (15) 37871
22.7%
Common
ValueCountFrequency (%)
11160
79.3%
/ 2609
 
18.5%
- 276
 
2.0%
( 12
 
0.1%
) 12
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 180571
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 17510
 
9.7%
o 17367
 
9.6%
e 16793
 
9.3%
t 15958
 
8.8%
r 14439
 
8.0%
n 11581
 
6.4%
l 11261
 
6.2%
11160
 
6.2%
a 9461
 
5.2%
c 7818
 
4.3%
Other values (20) 47223
26.2%

contributing_factor_vehicle_4
Categorical

HIGH CORRELATION  MISSING 

Distinct41
Distinct (%)2.5%
Missing1899878
Missing (%)99.9%
Memory size72.6 MiB
other vehicular
595 
following too closely
313 
driver inattention/distraction
242 
fatigued/drowsy
124 
pavement slippery
97 
Other values (36)
290 

Length

Max length43
Median length30
Mean length19.568332
Min length5

Characters and Unicode

Total characters32503
Distinct characters30
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)0.6%

Sample

1st rowother vehicular
2nd rowreaction to uninvolved vehicle
3rd rowother vehicular
4th rowpavement defective
5th rowother vehicular

Common Values

ValueCountFrequency (%)
other vehicular 595
 
< 0.1%
following too closely 313
 
< 0.1%
driver inattention/distraction 242
 
< 0.1%
fatigued/drowsy 124
 
< 0.1%
pavement slippery 97
 
< 0.1%
reaction to uninvolved vehicle 33
 
< 0.1%
unsafe speed 26
 
< 0.1%
driver inexperience 26
 
< 0.1%
outside car distraction 25
 
< 0.1%
alcohol involvement 19
 
< 0.1%
Other values (31) 161
 
< 0.1%
(Missing) 1899878
99.9%

Length

2025-04-26T02:07:03.002710image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
other 600
16.1%
vehicular 595
16.0%
closely 317
8.5%
too 317
8.5%
following 313
8.4%
driver 268
 
7.2%
inattention/distraction 242
 
6.5%
fatigued/drowsy 124
 
3.3%
pavement 101
 
2.7%
slippery 97
 
2.6%
Other values (66) 751
20.2%

Most occurring characters

ValueCountFrequency (%)
e 3185
 
9.8%
o 3177
 
9.8%
i 2902
 
8.9%
t 2675
 
8.2%
r 2592
 
8.0%
l 2258
 
6.9%
2064
 
6.4%
n 1782
 
5.5%
a 1673
 
5.1%
c 1431
 
4.4%
Other values (20) 8764
27.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 29996
92.3%
Space Separator 2064
 
6.4%
Other Punctuation 401
 
1.2%
Dash Punctuation 34
 
0.1%
Open Punctuation 4
 
< 0.1%
Close Punctuation 4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 3185
10.6%
o 3177
10.6%
i 2902
9.7%
t 2675
 
8.9%
r 2592
 
8.6%
l 2258
 
7.5%
n 1782
 
5.9%
a 1673
 
5.6%
c 1431
 
4.8%
h 1283
 
4.3%
Other values (15) 7038
23.5%
Space Separator
ValueCountFrequency (%)
2064
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 401
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 34
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 29996
92.3%
Common 2507
 
7.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 3185
10.6%
o 3177
10.6%
i 2902
9.7%
t 2675
 
8.9%
r 2592
 
8.6%
l 2258
 
7.5%
n 1782
 
5.9%
a 1673
 
5.6%
c 1431
 
4.8%
h 1283
 
4.3%
Other values (15) 7038
23.5%
Common
ValueCountFrequency (%)
2064
82.3%
/ 401
 
16.0%
- 34
 
1.4%
( 4
 
0.2%
) 4
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 32503
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 3185
 
9.8%
o 3177
 
9.8%
i 2902
 
8.9%
t 2675
 
8.2%
r 2592
 
8.0%
l 2258
 
6.9%
2064
 
6.4%
n 1782
 
5.5%
a 1673
 
5.1%
c 1431
 
4.4%
Other values (20) 8764
27.0%

contributing_factor_vehicle_5
Categorical

HIGH CORRELATION  MISSING 

Distinct29
Distinct (%)6.3%
Missing1901079
Missing (%)> 99.9%
Memory size72.6 MiB
other vehicular
175 
following too closely
78 
driver inattention/distraction
53 
pavement slippery
44 
fatigued/drowsy
29 
Other values (24)
81 

Length

Max length43
Median length30
Mean length18.86087
Min length5

Characters and Unicode

Total characters8676
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)2.4%

Sample

1st rowother vehicular
2nd rowother vehicular
3rd rowpavement slippery
4th rowpavement slippery
5th rowfollowing too closely

Common Values

ValueCountFrequency (%)
other vehicular 175
 
< 0.1%
following too closely 78
 
< 0.1%
driver inattention/distraction 53
 
< 0.1%
pavement slippery 44
 
< 0.1%
fatigued/drowsy 29
 
< 0.1%
alcohol involvement 11
 
< 0.1%
obstruction/debris 10
 
< 0.1%
reaction to uninvolved vehicle 9
 
< 0.1%
unsafe speed 9
 
< 0.1%
driver inexperience 8
 
< 0.1%
Other values (19) 34
 
< 0.1%
(Missing) 1901079
> 99.9%

Length

2025-04-26T02:07:03.117477image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
other 176
17.5%
vehicular 175
17.4%
too 80
8.0%
closely 80
8.0%
following 78
7.8%
driver 61
 
6.1%
inattention/distraction 53
 
5.3%
pavement 45
 
4.5%
slippery 44
 
4.4%
fatigued/drowsy 29
 
2.9%
Other values (46) 185
18.4%

Most occurring characters

ValueCountFrequency (%)
e 907
 
10.5%
o 825
 
9.5%
i 738
 
8.5%
r 694
 
8.0%
t 686
 
7.9%
l 623
 
7.2%
546
 
6.3%
n 442
 
5.1%
a 432
 
5.0%
c 383
 
4.4%
Other values (19) 2400
27.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 8022
92.5%
Space Separator 546
 
6.3%
Other Punctuation 95
 
1.1%
Dash Punctuation 9
 
0.1%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 907
11.3%
o 825
10.3%
i 738
 
9.2%
r 694
 
8.7%
t 686
 
8.6%
l 623
 
7.8%
n 442
 
5.5%
a 432
 
5.4%
c 383
 
4.8%
h 380
 
4.7%
Other values (14) 1912
23.8%
Space Separator
ValueCountFrequency (%)
546
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 95
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 9
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 8022
92.5%
Common 654
 
7.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 907
11.3%
o 825
10.3%
i 738
 
9.2%
r 694
 
8.7%
t 686
 
8.6%
l 623
 
7.8%
n 442
 
5.5%
a 432
 
5.4%
c 383
 
4.8%
h 380
 
4.7%
Other values (14) 1912
23.8%
Common
ValueCountFrequency (%)
546
83.5%
/ 95
 
14.5%
- 9
 
1.4%
( 2
 
0.3%
) 2
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8676
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 907
 
10.5%
o 825
 
9.5%
i 738
 
8.5%
r 694
 
8.0%
t 686
 
7.9%
l 623
 
7.2%
546
 
6.3%
n 442
 
5.1%
a 432
 
5.0%
c 383
 
4.4%
Other values (19) 2400
27.7%

collision_id
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct1901539
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3256245.3
Minimum22
Maximum4806433
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size29.0 MiB
2025-04-26T02:07:03.240758image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum22
5-th percentile105069.8
Q13202737.5
median3757207
Q34278186.5
95-th percentile4701111.2
Maximum4806433
Range4806411
Interquartile range (IQR)1075449

Descriptive statistics

Standard deviation1506534.6
Coefficient of variation (CV)0.46266004
Kurtosis0.17216815
Mean3256245.3
Median Absolute Deviation (MAD)537124
Skewness-1.2861947
Sum6.1918775 × 1012
Variance2.2696464 × 1012
MonotonicityNot monotonic
2025-04-26T02:07:03.356814image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4675373 1
 
< 0.1%
3318640 1
 
< 0.1%
3324606 1
 
< 0.1%
3325472 1
 
< 0.1%
3311096 1
 
< 0.1%
3322468 1
 
< 0.1%
3315127 1
 
< 0.1%
3316449 1
 
< 0.1%
3323540 1
 
< 0.1%
3315524 1
 
< 0.1%
Other values (1901529) 1901529
> 99.9%
ValueCountFrequency (%)
22 1
< 0.1%
23 1
< 0.1%
25 1
< 0.1%
26 1
< 0.1%
27 1
< 0.1%
28 1
< 0.1%
29 1
< 0.1%
30 1
< 0.1%
31 1
< 0.1%
32 1
< 0.1%
ValueCountFrequency (%)
4806433 1
< 0.1%
4806432 1
< 0.1%
4806429 1
< 0.1%
4806428 1
< 0.1%
4806425 1
< 0.1%
4806423 1
< 0.1%
4806422 1
< 0.1%
4806409 1
< 0.1%
4806408 1
< 0.1%
4806407 1
< 0.1%
Distinct1645
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size148.3 MiB
2025-04-26T02:07:03.469037image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length38
Median length35
Mean length16.765991
Min length1

Characters and Unicode

Total characters31881185
Distinct characters77
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1001 ?
Unique (%)0.1%

Sample

1st rowMoped
2nd rowSedan
3rd rowSedan
4th rowSedan
5th rowSedan
ValueCountFrequency (%)
vehicle 797022
17.9%
sedan 598331
13.5%
utility 591945
13.3%
station 591907
13.3%
wagon/sport 441234
9.9%
passenger 346612
7.8%
151982
 
3.4%
wagon 150726
 
3.4%
sport 150672
 
3.4%
truck 80835
 
1.8%
Other values (946) 542047
12.2%
2025-04-26T02:07:03.708343image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2553229
 
8.0%
S 2496024
 
7.8%
t 2238694
 
7.0%
i 1885086
 
5.9%
a 1571124
 
4.9%
e 1564629
 
4.9%
E 1517050
 
4.8%
n 1502079
 
4.7%
o 1398623
 
4.4%
T 974139
 
3.1%
Other values (67) 14180508
44.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 15149372
47.5%
Uppercase Letter 13436109
42.1%
Space Separator 2553229
 
8.0%
Other Punctuation 593269
 
1.9%
Decimal Number 53740
 
0.2%
Dash Punctuation 49760
 
0.2%
Open Punctuation 22853
 
0.1%
Close Punctuation 22849
 
0.1%
Modifier Symbol 2
 
< 0.1%
Control 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 2496024
18.6%
E 1517050
11.3%
T 974139
 
7.3%
I 881523
 
6.6%
V 860346
 
6.4%
A 735544
 
5.5%
N 724759
 
5.4%
U 645473
 
4.8%
W 610751
 
4.5%
R 603278
 
4.5%
Other values (18) 3387222
25.2%
Lowercase Letter
ValueCountFrequency (%)
t 2238694
14.8%
i 1885086
12.4%
a 1571124
10.4%
e 1564629
10.3%
n 1502079
9.9%
o 1398623
9.2%
l 917798
6.1%
d 634765
 
4.2%
r 595949
 
3.9%
c 583453
 
3.9%
Other values (15) 2257172
14.9%
Decimal Number
ValueCountFrequency (%)
4 39939
74.3%
6 11407
 
21.2%
2 1887
 
3.5%
3 337
 
0.6%
5 49
 
0.1%
0 45
 
0.1%
1 43
 
0.1%
9 16
 
< 0.1%
8 10
 
< 0.1%
7 7
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 593239
> 99.9%
. 15
 
< 0.1%
# 8
 
< 0.1%
, 3
 
< 0.1%
' 2
 
< 0.1%
? 1
 
< 0.1%
& 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2553229
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 49760
100.0%
Open Punctuation
ValueCountFrequency (%)
( 22853
100.0%
Close Punctuation
ValueCountFrequency (%)
) 22849
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 2
100.0%
Control
ValueCountFrequency (%)
 1
100.0%
Other Symbol
ValueCountFrequency (%)
� 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 28585479
89.7%
Common 3295704
 
10.3%
Cyrillic 2
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 2496024
 
8.7%
t 2238694
 
7.8%
i 1885086
 
6.6%
a 1571124
 
5.5%
e 1564629
 
5.5%
E 1517050
 
5.3%
n 1502079
 
5.3%
o 1398623
 
4.9%
T 974139
 
3.4%
l 917798
 
3.2%
Other values (41) 12520233
43.8%
Common
ValueCountFrequency (%)
2553229
77.5%
/ 593239
 
18.0%
- 49760
 
1.5%
4 39939
 
1.2%
( 22853
 
0.7%
) 22849
 
0.7%
6 11407
 
0.3%
2 1887
 
0.1%
3 337
 
< 0.1%
5 49
 
< 0.1%
Other values (14) 155
 
< 0.1%
Cyrillic
ValueCountFrequency (%)
Ð¥ 1
50.0%
Р 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 31881182
> 99.9%
Cyrillic 2
 
< 0.1%
Specials 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2553229
 
8.0%
S 2496024
 
7.8%
t 2238694
 
7.0%
i 1885086
 
5.9%
a 1571124
 
4.9%
e 1564629
 
4.9%
E 1517050
 
4.8%
n 1502079
 
4.7%
o 1398623
 
4.4%
T 974139
 
3.1%
Other values (64) 14180505
44.5%
Cyrillic
ValueCountFrequency (%)
Ð¥ 1
50.0%
Р 1
50.0%
Specials
ValueCountFrequency (%)
� 1
100.0%

vehicle_type_code_2
Text

MISSING 

Distinct1855
Distinct (%)0.1%
Missing375446
Missing (%)19.7%
Memory size132.1 MiB
2025-04-26T02:07:03.832623image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length38
Median length30
Mean length15.946613
Min length1

Characters and Unicode

Total characters24336014
Distinct characters73
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1113 ?
Unique (%)0.1%

Sample

1st rowSedan
2nd rowTractor Truck Diesel
3rd rowSedan
4th rowStation Wagon/Sport Utility Vehicle
5th rowStation Wagon/Sport Utility Vehicle
ValueCountFrequency (%)
vehicle 582151
16.9%
utility 428255
12.5%
station 428225
12.5%
sedan 411595
12.0%
wagon/sport 311493
9.1%
passenger 263227
7.7%
118016
 
3.4%
wagon 116786
 
3.4%
sport 116732
 
3.4%
truck 80192
 
2.3%
Other values (1018) 579885
16.9%
2025-04-26T02:07:04.101926image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1921523
 
7.9%
S 1821640
 
7.5%
t 1590468
 
6.5%
i 1368749
 
5.6%
E 1193894
 
4.9%
e 1137927
 
4.7%
a 1107523
 
4.6%
n 1052638
 
4.3%
o 1016075
 
4.2%
T 782475
 
3.2%
Other values (63) 11343102
46.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10993853
45.2%
Uppercase Letter 10852319
44.6%
Space Separator 1921523
 
7.9%
Other Punctuation 429579
 
1.8%
Dash Punctuation 50558
 
0.2%
Decimal Number 44577
 
0.2%
Open Punctuation 21803
 
0.1%
Close Punctuation 21800
 
0.1%
Modifier Symbol 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 1821640
16.8%
E 1193894
11.0%
T 782475
 
7.2%
N 728771
 
6.7%
I 703420
 
6.5%
V 639714
 
5.9%
A 572420
 
5.3%
U 529858
 
4.9%
W 499804
 
4.6%
O 490852
 
4.5%
Other values (16) 2889471
26.6%
Lowercase Letter
ValueCountFrequency (%)
t 1590468
14.5%
i 1368749
12.5%
e 1137927
10.4%
a 1107523
10.1%
n 1052638
9.6%
o 1016075
9.2%
l 654224
 
6.0%
r 458703
 
4.2%
c 448371
 
4.1%
d 441339
 
4.0%
Other values (15) 1717836
15.6%
Decimal Number
ValueCountFrequency (%)
4 32019
71.8%
6 10753
 
24.1%
2 1353
 
3.0%
3 314
 
0.7%
0 61
 
0.1%
1 30
 
0.1%
5 27
 
0.1%
9 8
 
< 0.1%
8 7
 
< 0.1%
7 5
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 429558
> 99.9%
. 11
 
< 0.1%
, 3
 
< 0.1%
' 3
 
< 0.1%
? 2
 
< 0.1%
# 1
 
< 0.1%
& 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
1921523
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 50558
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21803
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21800
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 21846172
89.8%
Common 2489842
 
10.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 1821640
 
8.3%
t 1590468
 
7.3%
i 1368749
 
6.3%
E 1193894
 
5.5%
e 1137927
 
5.2%
a 1107523
 
5.1%
n 1052638
 
4.8%
o 1016075
 
4.7%
T 782475
 
3.6%
N 728771
 
3.3%
Other values (41) 10046012
46.0%
Common
ValueCountFrequency (%)
1921523
77.2%
/ 429558
 
17.3%
- 50558
 
2.0%
4 32019
 
1.3%
( 21803
 
0.9%
) 21800
 
0.9%
6 10753
 
0.4%
2 1353
 
0.1%
3 314
 
< 0.1%
0 61
 
< 0.1%
Other values (12) 100
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24336014
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1921523
 
7.9%
S 1821640
 
7.5%
t 1590468
 
6.5%
i 1368749
 
5.6%
E 1193894
 
4.9%
e 1137927
 
4.7%
a 1107523
 
4.6%
n 1052638
 
4.3%
o 1016075
 
4.2%
T 782475
 
3.2%
Other values (63) 11343102
46.6%

vehicle_type_code_3
Text

MISSING 

Distinct270
Distinct (%)0.2%
Missing1770054
Missing (%)93.1%
Memory size77.9 MiB
2025-04-26T02:07:04.208722image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length35
Median length30
Mean length17.694072
Min length2

Characters and Unicode

Total characters2326505
Distinct characters61
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique162 ?
Unique (%)0.1%

Sample

1st rowSedan
2nd rowSedan
3rd rowStation Wagon/Sport Utility Vehicle
4th rowSedan
5th rowSedan
ValueCountFrequency (%)
vehicle 58769
18.5%
utility 46494
14.7%
station 46491
14.7%
sedan 45251
14.3%
wagon/sport 35242
11.1%
passenger 23184
 
7.3%
11326
 
3.6%
wagon 11249
 
3.5%
sport 11248
 
3.5%
truck 4087
 
1.3%
Other values (218) 23886
7.5%
2025-04-26T02:07:04.422022image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
186091
 
8.0%
S 184164
 
7.9%
t 177583
 
7.6%
i 146587
 
6.3%
a 119104
 
5.1%
e 118814
 
5.1%
n 116576
 
5.0%
o 108677
 
4.7%
E 97137
 
4.2%
l 71827
 
3.1%
Other values (51) 999945
43.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1156019
49.7%
Uppercase Letter 931012
40.0%
Space Separator 186091
 
8.0%
Other Punctuation 46570
 
2.0%
Dash Punctuation 2898
 
0.1%
Decimal Number 2589
 
0.1%
Open Punctuation 663
 
< 0.1%
Close Punctuation 663
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 184164
19.8%
E 97137
10.4%
T 65478
 
7.0%
V 61299
 
6.6%
I 59870
 
6.4%
N 54891
 
5.9%
U 50632
 
5.4%
W 49266
 
5.3%
A 48570
 
5.2%
O 38977
 
4.2%
Other values (15) 220728
23.7%
Lowercase Letter
ValueCountFrequency (%)
t 177583
15.4%
i 146587
12.7%
a 119104
10.3%
e 118814
10.3%
n 116576
10.1%
o 108677
9.4%
l 71827
6.2%
d 47391
 
4.1%
r 42798
 
3.7%
c 42477
 
3.7%
Other values (14) 164185
14.2%
Decimal Number
ValueCountFrequency (%)
4 2124
82.0%
6 315
 
12.2%
2 134
 
5.2%
3 12
 
0.5%
8 2
 
0.1%
5 1
 
< 0.1%
0 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
186091
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 46570
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2898
100.0%
Open Punctuation
ValueCountFrequency (%)
( 663
100.0%
Close Punctuation
ValueCountFrequency (%)
) 663
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2087031
89.7%
Common 239474
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 184164
 
8.8%
t 177583
 
8.5%
i 146587
 
7.0%
a 119104
 
5.7%
e 118814
 
5.7%
n 116576
 
5.6%
o 108677
 
5.2%
E 97137
 
4.7%
l 71827
 
3.4%
T 65478
 
3.1%
Other values (39) 881084
42.2%
Common
ValueCountFrequency (%)
186091
77.7%
/ 46570
 
19.4%
- 2898
 
1.2%
4 2124
 
0.9%
( 663
 
0.3%
) 663
 
0.3%
6 315
 
0.1%
2 134
 
0.1%
3 12
 
< 0.1%
8 2
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2326505
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
186091
 
8.0%
S 184164
 
7.9%
t 177583
 
7.6%
i 146587
 
6.3%
a 119104
 
5.1%
e 118814
 
5.1%
n 116576
 
5.0%
o 108677
 
4.7%
E 97137
 
4.2%
l 71827
 
3.1%
Other values (51) 999945
43.0%

vehicle_type_code_4
Text

MISSING 

Distinct102
Distinct (%)0.3%
Missing1871192
Missing (%)98.4%
Memory size73.8 MiB
2025-04-26T02:07:04.533664image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length35
Median length30
Mean length18.059248
Min length2

Characters and Unicode

Total characters548044
Distinct characters57
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique50 ?
Unique (%)0.2%

Sample

1st rowStation Wagon/Sport Utility Vehicle
2nd rowSedan
3rd rowStation Wagon/Sport Utility Vehicle
4th rowSedan
5th rowSedan
ValueCountFrequency (%)
vehicle 13988
18.9%
station 11309
15.3%
utility 11309
15.3%
sedan 11118
15.1%
wagon/sport 8890
12.0%
passenger 5057
 
6.8%
2428
 
3.3%
sport 2419
 
3.3%
wagon 2419
 
3.3%
truck 749
 
1.0%
Other values (101) 4143
 
5.6%
2025-04-26T02:07:04.740772image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 44661
 
8.1%
S 43628
 
8.0%
43525
 
7.9%
i 36614
 
6.7%
a 29521
 
5.4%
e 29357
 
5.4%
n 29033
 
5.3%
o 27143
 
5.0%
E 20832
 
3.8%
l 18008
 
3.3%
Other values (47) 225722
41.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 286471
52.3%
Uppercase Letter 205493
37.5%
Space Separator 43525
 
7.9%
Other Punctuation 11318
 
2.1%
Dash Punctuation 583
 
0.1%
Decimal Number 490
 
0.1%
Close Punctuation 82
 
< 0.1%
Open Punctuation 82
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 44661
15.6%
i 36614
12.8%
a 29521
10.3%
e 29357
10.2%
n 29033
10.1%
o 27143
9.5%
l 18008
6.3%
d 11559
 
4.0%
r 10303
 
3.6%
c 10258
 
3.6%
Other values (14) 40014
14.0%
Uppercase Letter
ValueCountFrequency (%)
S 43628
21.2%
E 20832
10.1%
V 14390
 
7.0%
T 13729
 
6.7%
I 12712
 
6.2%
U 12021
 
5.8%
W 11803
 
5.7%
N 11561
 
5.6%
A 10337
 
5.0%
P 8142
 
4.0%
Other values (14) 46338
22.5%
Decimal Number
ValueCountFrequency (%)
4 415
84.7%
6 39
 
8.0%
2 35
 
7.1%
3 1
 
0.2%
Space Separator
ValueCountFrequency (%)
43525
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 11318
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 583
100.0%
Close Punctuation
ValueCountFrequency (%)
) 82
100.0%
Open Punctuation
ValueCountFrequency (%)
( 82
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 491964
89.8%
Common 56080
 
10.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 44661
 
9.1%
S 43628
 
8.9%
i 36614
 
7.4%
a 29521
 
6.0%
e 29357
 
6.0%
n 29033
 
5.9%
o 27143
 
5.5%
E 20832
 
4.2%
l 18008
 
3.7%
V 14390
 
2.9%
Other values (38) 198777
40.4%
Common
ValueCountFrequency (%)
43525
77.6%
/ 11318
 
20.2%
- 583
 
1.0%
4 415
 
0.7%
) 82
 
0.1%
( 82
 
0.1%
6 39
 
0.1%
2 35
 
0.1%
3 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 548044
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 44661
 
8.1%
S 43628
 
8.0%
43525
 
7.9%
i 36614
 
6.7%
a 29521
 
5.4%
e 29357
 
5.4%
n 29033
 
5.3%
o 27143
 
5.0%
E 20832
 
3.8%
l 18008
 
3.3%
Other values (47) 225722
41.2%

vehicle_type_code_5
Text

MISSING 

Distinct70
Distinct (%)0.8%
Missing1893051
Missing (%)99.6%
Memory size72.9 MiB
2025-04-26T02:07:04.838171image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length35
Median length30
Mean length18.304665
Min length2

Characters and Unicode

Total characters155370
Distinct characters55
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique34 ?
Unique (%)0.4%

Sample

1st rowStation Wagon/Sport Utility Vehicle
2nd rowStation Wagon/Sport Utility Vehicle
3rd rowSedan
4th rowSedan
5th rowStation Wagon/Sport Utility Vehicle
ValueCountFrequency (%)
vehicle 3871
18.5%
station 3300
15.8%
utility 3300
15.8%
sedan 3178
15.2%
wagon/sport 2595
12.4%
passenger 1269
 
6.1%
wagon 707
 
3.4%
706
 
3.4%
sport 705
 
3.4%
truck 248
 
1.2%
Other values (71) 1014
 
4.9%
2025-04-26T02:07:05.060589image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 13047
 
8.4%
12411
 
8.0%
S 12260
 
7.9%
i 10695
 
6.9%
a 8538
 
5.5%
e 8497
 
5.5%
n 8421
 
5.4%
o 7947
 
5.1%
l 5257
 
3.4%
E 5206
 
3.4%
Other values (45) 63091
40.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 83473
53.7%
Uppercase Letter 55856
36.0%
Space Separator 12411
 
8.0%
Other Punctuation 3301
 
2.1%
Dash Punctuation 195
 
0.1%
Decimal Number 108
 
0.1%
Open Punctuation 13
 
< 0.1%
Close Punctuation 13
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 13047
15.6%
i 10695
12.8%
a 8538
10.2%
e 8497
10.2%
n 8421
10.1%
o 7947
9.5%
l 5257
6.3%
d 3283
 
3.9%
c 3061
 
3.7%
r 3007
 
3.6%
Other values (14) 11720
14.0%
Uppercase Letter
ValueCountFrequency (%)
S 12260
21.9%
E 5206
9.3%
T 3996
 
7.2%
V 3964
 
7.1%
I 3477
 
6.2%
U 3444
 
6.2%
W 3377
 
6.0%
N 2949
 
5.3%
A 2783
 
5.0%
O 2278
 
4.1%
Other values (13) 12122
21.7%
Decimal Number
ValueCountFrequency (%)
4 92
85.2%
2 9
 
8.3%
6 7
 
6.5%
Space Separator
ValueCountFrequency (%)
12411
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 3301
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 195
100.0%
Open Punctuation
ValueCountFrequency (%)
( 13
100.0%
Close Punctuation
ValueCountFrequency (%)
) 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 139329
89.7%
Common 16041
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 13047
 
9.4%
S 12260
 
8.8%
i 10695
 
7.7%
a 8538
 
6.1%
e 8497
 
6.1%
n 8421
 
6.0%
o 7947
 
5.7%
l 5257
 
3.8%
E 5206
 
3.7%
T 3996
 
2.9%
Other values (37) 55465
39.8%
Common
ValueCountFrequency (%)
12411
77.4%
/ 3301
 
20.6%
- 195
 
1.2%
4 92
 
0.6%
( 13
 
0.1%
) 13
 
0.1%
2 9
 
0.1%
6 7
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 155370
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 13047
 
8.4%
12411
 
8.0%
S 12260
 
7.9%
i 10695
 
6.9%
a 8538
 
5.5%
e 8497
 
5.5%
n 8421
 
5.4%
o 7947
 
5.1%
l 5257
 
3.4%
E 5206
 
3.4%
Other values (45) 63091
40.6%

crash_hour
Real number (ℝ)

ZEROS 

Distinct24
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.181395
Minimum0
Maximum23
Zeros62782
Zeros (%)3.3%
Negative0
Negative (%)0.0%
Memory size29.0 MiB
2025-04-26T02:07:05.165809image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q19
median14
Q318
95-th percentile22
Maximum23
Range23
Interquartile range (IQR)9

Descriptive statistics

Standard deviation5.7787509
Coefficient of variation (CV)0.43840209
Kurtosis-0.44228475
Mean13.181395
Median Absolute Deviation (MAD)4
Skewness-0.43622471
Sum25064936
Variance33.393963
MonotonicityNot monotonic
2025-04-26T02:07:05.264971image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
16 135576
 
7.1%
17 132832
 
7.0%
14 126151
 
6.6%
15 118366
 
6.2%
18 116843
 
6.1%
13 109281
 
5.7%
8 104431
 
5.5%
12 104328
 
5.5%
9 100246
 
5.3%
11 98034
 
5.2%
Other values (14) 755451
39.7%
ValueCountFrequency (%)
0 62782
3.3%
1 33564
 
1.8%
2 25882
 
1.4%
3 22807
 
1.2%
4 25747
 
1.4%
5 27729
 
1.5%
6 42341
2.2%
7 58147
3.1%
8 104431
5.5%
9 100246
5.3%
ValueCountFrequency (%)
23 52838
 
2.8%
22 62880
3.3%
21 69162
3.6%
20 81066
4.3%
19 96700
5.1%
18 116843
6.1%
17 132832
7.0%
16 135576
7.1%
15 118366
6.2%
14 126151
6.6%

crash_day
Categorical

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size130.8 MiB
Friday
303250 
Thursday
283437 
Tuesday
279745 
Wednesday
276856 
Monday
271784 
Other values (2)
486467 

Length

Max length9
Median length8
Mean length7.1523682
Min length6

Characters and Unicode

Total characters13600507
Distinct characters17
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWednesday
2nd rowSaturday
3rd rowTuesday
4th rowTuesday
5th rowTuesday

Common Values

ValueCountFrequency (%)
Friday 303250
15.9%
Thursday 283437
14.9%
Tuesday 279745
14.7%
Wednesday 276856
14.6%
Monday 271784
14.3%
Saturday 257043
13.5%
Sunday 229424
12.1%

Length

2025-04-26T02:07:05.371589image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T02:07:05.471044image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
friday 303250
15.9%
thursday 283437
14.9%
tuesday 279745
14.7%
wednesday 276856
14.6%
monday 271784
14.3%
saturday 257043
13.5%
sunday 229424
12.1%

Most occurring characters

ValueCountFrequency (%)
d 2178395
16.0%
a 2158582
15.9%
y 1901539
14.0%
u 1049649
7.7%
r 843730
 
6.2%
s 840038
 
6.2%
e 833457
 
6.1%
n 778064
 
5.7%
T 563182
 
4.1%
S 486467
 
3.6%
Other values (7) 1967404
14.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11698968
86.0%
Uppercase Letter 1901539
 
14.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
d 2178395
18.6%
a 2158582
18.5%
y 1901539
16.3%
u 1049649
9.0%
r 843730
 
7.2%
s 840038
 
7.2%
e 833457
 
7.1%
n 778064
 
6.7%
i 303250
 
2.6%
h 283437
 
2.4%
Other values (2) 528827
 
4.5%
Uppercase Letter
ValueCountFrequency (%)
T 563182
29.6%
S 486467
25.6%
F 303250
15.9%
W 276856
14.6%
M 271784
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 13600507
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
d 2178395
16.0%
a 2158582
15.9%
y 1901539
14.0%
u 1049649
7.7%
r 843730
 
6.2%
s 840038
 
6.2%
e 833457
 
6.1%
n 778064
 
5.7%
T 563182
 
4.1%
S 486467
 
3.6%
Other values (7) 1967404
14.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13600507
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
d 2178395
16.0%
a 2158582
15.9%
y 1901539
14.0%
u 1049649
7.7%
r 843730
 
6.2%
s 840038
 
6.2%
e 833457
 
6.1%
n 778064
 
5.7%
T 563182
 
4.1%
S 486467
 
3.6%
Other values (7) 1967404
14.5%

crash_month
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size129.1 MiB
October
172792 
July
168803 
September
167637 
August
167294 
December
165401 
Other values (7)
1059612 

Length

Max length9
Median length7
Mean length6.1817133
Min length3

Characters and Unicode

Total characters11754769
Distinct characters26
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNovember
2nd rowSeptember
3rd rowDecember
4th rowDecember
5th rowDecember

Common Values

ValueCountFrequency (%)
October 172792
9.1%
July 168803
8.9%
September 167637
8.8%
August 167294
8.8%
December 165401
8.7%
November 164778
8.7%
June 160323
8.4%
May 158408
8.3%
March 152600
8.0%
January 151778
8.0%
Other values (2) 271725
14.3%

Length

2025-04-26T02:07:05.587087image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
october 172792
9.1%
july 168803
8.9%
september 167637
8.8%
august 167294
8.8%
december 165401
8.7%
november 164778
8.7%
june 160323
8.4%
may 158408
8.3%
march 152600
8.0%
january 151778
8.0%
Other values (2) 271725
14.3%

Most occurring characters

ValueCountFrequency (%)
e 1800284
15.3%
r 1385210
11.8%
u 953991
 
8.1%
b 809107
 
6.9%
a 753063
 
6.4%
y 617488
 
5.3%
t 507723
 
4.3%
m 497816
 
4.2%
c 490793
 
4.2%
J 480904
 
4.1%
Other values (16) 3458390
29.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 9853230
83.8%
Uppercase Letter 1901539
 
16.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1800284
18.3%
r 1385210
14.1%
u 953991
9.7%
b 809107
8.2%
a 753063
7.6%
y 617488
 
6.3%
t 507723
 
5.2%
m 497816
 
5.1%
c 490793
 
5.0%
o 337570
 
3.4%
Other values (8) 1700185
17.3%
Uppercase Letter
ValueCountFrequency (%)
J 480904
25.3%
M 311008
16.4%
A 300520
15.8%
O 172792
 
9.1%
S 167637
 
8.8%
D 165401
 
8.7%
N 164778
 
8.7%
F 138499
 
7.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 11754769
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1800284
15.3%
r 1385210
11.8%
u 953991
 
8.1%
b 809107
 
6.9%
a 753063
 
6.4%
y 617488
 
5.3%
t 507723
 
4.3%
m 497816
 
4.2%
c 490793
 
4.2%
J 480904
 
4.1%
Other values (16) 3458390
29.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11754769
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 1800284
15.3%
r 1385210
11.8%
u 953991
 
8.1%
b 809107
 
6.9%
a 753063
 
6.4%
y 617488
 
5.3%
t 507723
 
4.3%
m 497816
 
4.2%
c 490793
 
4.2%
J 480904
 
4.1%
Other values (16) 3458390
29.4%

crash_year
Real number (ℝ)

HIGH CORRELATION 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2017.4449
Minimum2012
Maximum2025
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size29.0 MiB
2025-04-26T02:07:05.677937image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum2012
5-th percentile2013
Q12015
median2017
Q32020
95-th percentile2024
Maximum2025
Range13
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.3444729
Coefficient of variation (CV)0.0016577766
Kurtosis-0.73977256
Mean2017.4449
Median Absolute Deviation (MAD)2
Skewness0.33249917
Sum3.8362501 × 109
Variance11.185499
MonotonicityNot monotonic
2025-04-26T02:07:05.766898image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
2017 212566
11.2%
2018 212516
11.2%
2019 192239
10.1%
2016 188544
9.9%
2015 182642
9.6%
2014 171836
9.0%
2013 171563
9.0%
2020 102669
 
5.4%
2021 99559
 
5.2%
2022 91590
 
4.8%
Other values (4) 275815
14.5%
ValueCountFrequency (%)
2012 85341
4.5%
2013 171563
9.0%
2014 171836
9.0%
2015 182642
9.6%
2016 188544
9.9%
2017 212566
11.2%
2018 212516
11.2%
2019 192239
10.1%
2020 102669
5.4%
2021 99559
5.2%
ValueCountFrequency (%)
2025 21115
 
1.1%
2024 81948
 
4.3%
2023 87411
4.6%
2022 91590
4.8%
2021 99559
5.2%
2020 102669
5.4%
2019 192239
10.1%
2018 212516
11.2%
2017 212566
11.2%
2016 188544
9.9%

holiday_name
Categorical

HIGH CORRELATION  MISSING 

Distinct12
Distinct (%)< 0.1%
Missing1856022
Missing (%)97.6%
Memory size74.3 MiB
Veterans Day
5076 
Lincoln's Birthday
4748 
Labour Day
4433 
Columbus Day
4416 
Independence Day
4205 
Other values (7)
22639 

Length

Max length36
Median length21
Mean length15.858427
Min length10

Characters and Unicode

Total characters721828
Distinct characters38
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLabour Day
2nd rowLabour Day
3rd rowIndependence Day
4th rowIndependence Day
5th rowIndependence Day

Common Values

ValueCountFrequency (%)
Veterans Day 5076
 
0.3%
Lincoln's Birthday 4748
 
0.2%
Labour Day 4433
 
0.2%
Columbus Day 4416
 
0.2%
Independence Day 4205
 
0.2%
Martin Luther King, Jr. Day 3815
 
0.2%
Thanksgiving Day 3748
 
0.2%
New Year's Day 3745
 
0.2%
Memorial Day 3688
 
0.2%
Washington's Birthday 3512
 
0.2%
Other values (2) 4131
 
0.2%
(Missing) 1856022
97.6%

Length

2025-04-26T02:07:05.873843image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
day 37257
34.4%
birthday 8260
 
7.6%
independence 5237
 
4.8%
veterans 5076
 
4.7%
lincoln's 4748
 
4.4%
labour 4433
 
4.1%
columbus 4416
 
4.1%
jr 3815
 
3.5%
king 3815
 
3.5%
luther 3815
 
3.5%
Other values (9) 27416
25.3%

Most occurring characters

ValueCountFrequency (%)
a 78697
 
10.9%
62771
 
8.7%
n 55529
 
7.7%
e 49189
 
6.8%
y 45517
 
6.3%
r 39746
 
5.5%
i 39465
 
5.5%
D 37257
 
5.2%
s 34955
 
4.8%
t 30673
 
4.2%
Other values (28) 248029
34.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 531134
73.6%
Uppercase Letter 108288
 
15.0%
Space Separator 62771
 
8.7%
Other Punctuation 19635
 
2.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 78697
14.8%
n 55529
10.5%
e 49189
9.3%
y 45517
8.6%
r 39746
 
7.5%
i 39465
 
7.4%
s 34955
 
6.6%
t 30673
 
5.8%
h 23466
 
4.4%
o 21829
 
4.1%
Other values (11) 112068
21.1%
Uppercase Letter
ValueCountFrequency (%)
D 37257
34.4%
L 12996
 
12.0%
B 8260
 
7.6%
C 7515
 
6.9%
M 7503
 
6.9%
I 5237
 
4.8%
V 5076
 
4.7%
J 4847
 
4.5%
N 4777
 
4.4%
K 3815
 
3.5%
Other values (3) 11005
 
10.2%
Other Punctuation
ValueCountFrequency (%)
' 12005
61.1%
, 3815
 
19.4%
. 3815
 
19.4%
Space Separator
ValueCountFrequency (%)
62771
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 639422
88.6%
Common 82406
 
11.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 78697
 
12.3%
n 55529
 
8.7%
e 49189
 
7.7%
y 45517
 
7.1%
r 39746
 
6.2%
i 39465
 
6.2%
D 37257
 
5.8%
s 34955
 
5.5%
t 30673
 
4.8%
h 23466
 
3.7%
Other values (24) 204928
32.0%
Common
ValueCountFrequency (%)
62771
76.2%
' 12005
 
14.6%
, 3815
 
4.6%
. 3815
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 721828
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 78697
 
10.9%
62771
 
8.7%
n 55529
 
7.7%
e 49189
 
6.8%
y 45517
 
6.3%
r 39746
 
5.5%
i 39465
 
5.5%
D 37257
 
5.2%
s 34955
 
4.8%
t 30673
 
4.2%
Other values (28) 248029
34.4%

is_public_holiday
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size119.7 MiB
0
1856022 
1
 
45517

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1901539
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 1856022
97.6%
1 45517
 
2.4%

Length

2025-04-26T02:07:05.972758image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T02:07:06.046692image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 1856022
97.6%
1 45517
 
2.4%

Most occurring characters

ValueCountFrequency (%)
0 1856022
97.6%
1 45517
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1901539
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1856022
97.6%
1 45517
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Common 1901539
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1856022
97.6%
1 45517
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1901539
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1856022
97.6%
1 45517
 
2.4%

Number_of_involved_Vehicles
Categorical

IMBALANCE 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size119.7 MiB
2
1394608 
1
375446 
3
 
101138
4
 
21859
5
 
8488

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1901539
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row1
3rd row2
4th row2
5th row1

Common Values

ValueCountFrequency (%)
2 1394608
73.3%
1 375446
 
19.7%
3 101138
 
5.3%
4 21859
 
1.1%
5 8488
 
0.4%

Length

2025-04-26T02:07:06.138986image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T02:07:06.228957image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
2 1394608
73.3%
1 375446
 
19.7%
3 101138
 
5.3%
4 21859
 
1.1%
5 8488
 
0.4%

Most occurring characters

ValueCountFrequency (%)
2 1394608
73.3%
1 375446
 
19.7%
3 101138
 
5.3%
4 21859
 
1.1%
5 8488
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1901539
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 1394608
73.3%
1 375446
 
19.7%
3 101138
 
5.3%
4 21859
 
1.1%
5 8488
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Common 1901539
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 1394608
73.3%
1 375446
 
19.7%
3 101138
 
5.3%
4 21859
 
1.1%
5 8488
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1901539
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 1394608
73.3%
1 375446
 
19.7%
3 101138
 
5.3%
4 21859
 
1.1%
5 8488
 
0.4%

geometry
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size29.0 MiB

BoroName
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size131.4 MiB
Brooklyn
582400 
Queens
541364 
Manhattan
401197 
Bronx
283137 
Staten Island
93441 

Length

Max length13
Median length9
Mean length7.4405915
Min length5

Characters and Unicode

Total characters14148575
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBrooklyn
2nd rowBrooklyn
3rd rowBrooklyn
4th rowBronx
5th rowBrooklyn

Common Values

ValueCountFrequency (%)
Brooklyn 582400
30.6%
Queens 541364
28.5%
Manhattan 401197
21.1%
Bronx 283137
14.9%
Staten Island 93441
 
4.9%

Length

2025-04-26T02:07:06.318217image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T02:07:06.441210image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
brooklyn 582400
29.2%
queens 541364
27.1%
manhattan 401197
20.1%
bronx 283137
14.2%
staten 93441
 
4.7%
island 93441
 
4.7%

Most occurring characters

ValueCountFrequency (%)
n 2396177
16.9%
o 1447937
10.2%
a 1390473
9.8%
e 1176169
 
8.3%
t 989276
 
7.0%
B 865537
 
6.1%
r 865537
 
6.1%
l 675841
 
4.8%
s 634805
 
4.5%
y 582400
 
4.1%
Other values (10) 3124423
22.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 12060154
85.2%
Uppercase Letter 1994980
 
14.1%
Space Separator 93441
 
0.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 2396177
19.9%
o 1447937
12.0%
a 1390473
11.5%
e 1176169
9.8%
t 989276
8.2%
r 865537
 
7.2%
l 675841
 
5.6%
s 634805
 
5.3%
y 582400
 
4.8%
k 582400
 
4.8%
Other values (4) 1319139
10.9%
Uppercase Letter
ValueCountFrequency (%)
B 865537
43.4%
Q 541364
27.1%
M 401197
20.1%
S 93441
 
4.7%
I 93441
 
4.7%
Space Separator
ValueCountFrequency (%)
93441
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 14055134
99.3%
Common 93441
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 2396177
17.0%
o 1447937
10.3%
a 1390473
9.9%
e 1176169
 
8.4%
t 989276
 
7.0%
B 865537
 
6.2%
r 865537
 
6.2%
l 675841
 
4.8%
s 634805
 
4.5%
y 582400
 
4.1%
Other values (9) 3030982
21.6%
Common
ValueCountFrequency (%)
93441
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14148575
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 2396177
16.9%
o 1447937
10.2%
a 1390473
9.8%
e 1176169
 
8.3%
t 989276
 
7.0%
B 865537
 
6.1%
r 865537
 
6.1%
l 675841
 
4.8%
s 634805
 
4.5%
y 582400
 
4.1%
Other values (10) 3124423
22.1%

total_injured
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct35
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.63588598
Minimum0
Maximum86
Zeros1451188
Zeros (%)76.3%
Negative0
Negative (%)0.0%
Memory size29.0 MiB
2025-04-26T02:07:06.547431image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile4
Maximum86
Range86
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.4086404
Coefficient of variation (CV)2.2152406
Kurtosis45.009948
Mean0.63588598
Median Absolute Deviation (MAD)0
Skewness4.1077916
Sum1209162
Variance1.9842679
MonotonicityNot monotonic
2025-04-26T02:07:06.655348image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=35)
ValueCountFrequency (%)
0 1451188
76.3%
2 342874
 
18.0%
4 64999
 
3.4%
6 21362
 
1.1%
8 7850
 
0.4%
1 7629
 
0.4%
10 2990
 
0.2%
12 1232
 
0.1%
14 514
 
< 0.1%
3 276
 
< 0.1%
Other values (25) 625
 
< 0.1%
ValueCountFrequency (%)
0 1451188
76.3%
1 7629
 
0.4%
2 342874
 
18.0%
3 276
 
< 0.1%
4 64999
 
3.4%
5 30
 
< 0.1%
6 21362
 
1.1%
7 8
 
< 0.1%
8 7850
 
0.4%
9 3
 
< 0.1%
ValueCountFrequency (%)
86 1
 
< 0.1%
68 1
 
< 0.1%
64 1
 
< 0.1%
54 1
 
< 0.1%
50 1
 
< 0.1%
48 3
< 0.1%
46 1
 
< 0.1%
44 3
< 0.1%
42 1
 
< 0.1%
40 2
< 0.1%

total_killed
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0029828471
Minimum0
Maximum16
Zeros1898794
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size29.0 MiB
2025-04-26T02:07:06.737664image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum16
Range16
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.08154706
Coefficient of variation (CV)27.338666
Kurtosis1996.4772
Mean0.0029828471
Median Absolute Deviation (MAD)0
Skewness34.152331
Sum5672
Variance0.006649923
MonotonicityNot monotonic
2025-04-26T02:07:06.839092image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
0 1898794
99.9%
2 2597
 
0.1%
4 71
 
< 0.1%
1 58
 
< 0.1%
6 13
 
< 0.1%
8 4
 
< 0.1%
16 1
 
< 0.1%
10 1
 
< 0.1%
ValueCountFrequency (%)
0 1898794
99.9%
1 58
 
< 0.1%
2 2597
 
0.1%
4 71
 
< 0.1%
6 13
 
< 0.1%
8 4
 
< 0.1%
10 1
 
< 0.1%
16 1
 
< 0.1%
ValueCountFrequency (%)
16 1
 
< 0.1%
10 1
 
< 0.1%
8 4
 
< 0.1%
6 13
 
< 0.1%
4 71
 
< 0.1%
2 2597
 
0.1%
1 58
 
< 0.1%
0 1898794
99.9%

severity
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size135.7 MiB
No Casualty
1449128 
Injury
449666 
Fatal
 
2745

Length

Max length11
Median length11
Mean length9.8089647
Min length5

Characters and Unicode

Total characters18652129
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowInjury
2nd rowNo Casualty
3rd rowNo Casualty
4th rowInjury
5th rowNo Casualty

Common Values

ValueCountFrequency (%)
No Casualty 1449128
76.2%
Injury 449666
 
23.6%
Fatal 2745
 
0.1%

Length

2025-04-26T02:07:06.944256image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T02:07:07.036999image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
no 1449128
43.2%
casualty 1449128
43.2%
injury 449666
 
13.4%
fatal 2745
 
0.1%

Most occurring characters

ValueCountFrequency (%)
a 2903746
15.6%
u 1898794
10.2%
y 1898794
10.2%
l 1451873
7.8%
t 1451873
7.8%
N 1449128
7.8%
o 1449128
7.8%
1449128
7.8%
C 1449128
7.8%
s 1449128
7.8%
Other values (5) 1801409
9.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 13852334
74.3%
Uppercase Letter 3350667
 
18.0%
Space Separator 1449128
 
7.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2903746
21.0%
u 1898794
13.7%
y 1898794
13.7%
l 1451873
10.5%
t 1451873
10.5%
o 1449128
10.5%
s 1449128
10.5%
n 449666
 
3.2%
j 449666
 
3.2%
r 449666
 
3.2%
Uppercase Letter
ValueCountFrequency (%)
N 1449128
43.2%
C 1449128
43.2%
I 449666
 
13.4%
F 2745
 
0.1%
Space Separator
ValueCountFrequency (%)
1449128
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 17203001
92.2%
Common 1449128
 
7.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2903746
16.9%
u 1898794
11.0%
y 1898794
11.0%
l 1451873
8.4%
t 1451873
8.4%
N 1449128
8.4%
o 1449128
8.4%
C 1449128
8.4%
s 1449128
8.4%
I 449666
 
2.6%
Other values (4) 1351743
7.9%
Common
ValueCountFrequency (%)
1449128
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 18652129
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 2903746
15.6%
u 1898794
10.2%
y 1898794
10.2%
l 1451873
7.8%
t 1451873
7.8%
N 1449128
7.8%
o 1449128
7.8%
1449128
7.8%
C 1449128
7.8%
s 1449128
7.8%
Other values (5) 1801409
9.7%

location_type
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size138.0 MiB
intersection
1187357 
off_street
404419 
mid_block
309763 

Length

Max length12
Median length12
Mean length11.085937
Min length9

Characters and Unicode

Total characters21080341
Distinct characters15
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowintersection
2nd rowoff_street
3rd rowmid_block
4th rowoff_street
5th rowoff_street

Common Values

ValueCountFrequency (%)
intersection 1187357
62.4%
off_street 404419
 
21.3%
mid_block 309763
 
16.3%

Length

2025-04-26T02:07:07.137798image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T02:07:07.224916image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
intersection 1187357
62.4%
off_street 404419
 
21.3%
mid_block 309763
 
16.3%

Most occurring characters

ValueCountFrequency (%)
t 3183552
15.1%
e 3183552
15.1%
i 2684477
12.7%
n 2374714
11.3%
o 1901539
9.0%
r 1591776
7.6%
s 1591776
7.6%
c 1497120
7.1%
f 808838
 
3.8%
_ 714182
 
3.4%
Other values (5) 1548815
7.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 20366159
96.6%
Connector Punctuation 714182
 
3.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 3183552
15.6%
e 3183552
15.6%
i 2684477
13.2%
n 2374714
11.7%
o 1901539
9.3%
r 1591776
7.8%
s 1591776
7.8%
c 1497120
7.4%
f 808838
 
4.0%
m 309763
 
1.5%
Other values (4) 1239052
 
6.1%
Connector Punctuation
ValueCountFrequency (%)
_ 714182
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 20366159
96.6%
Common 714182
 
3.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 3183552
15.6%
e 3183552
15.6%
i 2684477
13.2%
n 2374714
11.7%
o 1901539
9.3%
r 1591776
7.8%
s 1591776
7.8%
c 1497120
7.4%
f 808838
 
4.0%
m 309763
 
1.5%
Other values (4) 1239052
 
6.1%
Common
ValueCountFrequency (%)
_ 714182
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21080341
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 3183552
15.1%
e 3183552
15.1%
i 2684477
12.7%
n 2374714
11.3%
o 1901539
9.0%
r 1591776
7.6%
s 1591776
7.6%
c 1497120
7.1%
f 808838
 
3.8%
_ 714182
 
3.4%
Other values (5) 1548815
7.3%

Interactions

2025-04-26T02:06:27.181042image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:41.347344image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:45.159472image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:49.042789image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:53.399045image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:57.645788image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:01.351703image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:05.016820image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:08.723496image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:12.340718image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:16.031734image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:19.827728image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:23.570570image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:27.452681image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:41.628433image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:45.438961image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:49.355731image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:53.741810image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:57.926287image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:01.615183image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:05.281133image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:08.986557image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:12.612913image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:16.311171image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:20.117328image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:23.836077image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:27.776012image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:41.996841image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:45.778074image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:49.761054image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:54.112647image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:58.265172image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:01.934951image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:05.620129image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:09.300735image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:12.941381image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:16.642542image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:20.456939image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:24.163979image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:28.098879image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:42.315882image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:46.113866image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:50.127107image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:54.466731image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:58.585334image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:02.258776image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:05.949866image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:09.607507image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:13.281698image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:16.966000image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:20.784494image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:24.479514image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:28.377304image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:42.595917image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:46.410083image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:50.464491image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:54.797336image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:58.866603image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:02.544352image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:06.233567image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:09.886724image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:13.552080image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:17.259481image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:21.080188image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:24.757067image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:28.641179image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:42.885501image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:46.691169image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:50.813314image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:55.101323image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:59.144815image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:02.807175image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:06.519277image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:10.153167image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:13.833710image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:17.534420image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:21.368117image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:25.028609image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:28.921271image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:43.164226image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:46.971946image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:51.132140image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:55.416015image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:59.428195image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:03.080851image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:06.796813image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:10.432968image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:14.105591image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:17.820580image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:21.642898image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:25.308984image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:29.194994image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:43.440606image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:47.256084image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:51.454219image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:55.727135image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:59.697308image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:03.363448image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:07.075868image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:10.692377image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:14.370793image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:18.118234image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:21.915966image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:25.583241image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:29.465689image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:43.728670image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:47.543924image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:51.794434image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:56.064526image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:59.968887image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:03.629447image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:07.356794image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:10.947438image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:14.643724image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:18.415703image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:22.197687image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:25.848217image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:29.732693image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:44.020840image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:47.834486image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:52.124627image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:56.396210image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:00.254358image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:03.902500image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:07.630083image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:11.245360image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:14.915520image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:18.710597image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:22.468691image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:26.121354image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:29.994925image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:44.305670image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:48.112707image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:52.444963image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:56.724002image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:00.519534image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:04.174023image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:07.903426image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:11.517254image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:15.181226image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:18.990004image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:22.733453image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:26.387796image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:30.278574image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:44.590516image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:48.408147image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:52.773261image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:57.036967image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:00.809547image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:04.452926image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:08.176173image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:11.796620image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:15.457252image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:19.269785image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:23.023611image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:26.647806image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:30.535067image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:44.879099image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:48.706629image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:53.085398image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:05:57.365246image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:01.070205image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:04.725203image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:08.448424image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:12.061396image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:15.732349image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:19.543684image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:23.288919image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2025-04-26T02:06:26.906316image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Correlations

2025-04-26T02:07:07.317209image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
latitudelongitudenumber_of_persons_injurednumber_of_persons_killednumber_of_pedestrians_injurednumber_of_pedestrians_killednumber_of_motorist_injurednumber_of_motorist_killedcollision_idcrash_hourcrash_yeartotal_injuredtotal_killednumber_of_cyclist_injurednumber_of_cyclist_killedcontributing_factor_vehicle_3contributing_factor_vehicle_4contributing_factor_vehicle_5crash_daycrash_monthholiday_nameis_public_holidayNumber_of_involved_VehiclesBoroNameseveritylocation_type
latitude1.0000.294-0.026-0.0010.003-0.001-0.032-0.001-0.011-0.010-0.003-0.026-0.0010.0170.0010.0940.1080.0860.0110.0060.0300.0130.0370.6480.0430.066
longitude0.2941.0000.0380.003-0.0170.0000.0760.0060.064-0.0080.0550.0380.0030.0370.0020.0690.0660.0300.0130.0050.0230.0090.0370.6860.0330.054
number_of_persons_injured-0.0260.0381.0000.0020.408-0.0040.7800.0120.1480.0340.1450.9990.0020.0040.0430.0000.0740.0900.0070.0030.0130.0050.0390.0080.0690.010
number_of_persons_killed-0.0010.0030.0021.000-0.0020.7170.0080.6180.011-0.0040.0110.0030.9990.0050.7370.1130.0000.0000.0020.0000.0110.0020.0210.0020.7070.010
number_of_pedestrians_injured0.003-0.0170.408-0.0021.0000.002-0.089-0.0040.0230.0340.0250.410-0.0020.0000.1670.0000.6900.0000.0010.0000.0050.0050.0120.0020.0280.003
number_of_pedestrians_killed-0.0010.000-0.0040.7170.0021.000-0.0030.0030.004-0.0000.004-0.0040.7160.0020.7070.0290.1761.0000.0010.0020.0000.0040.0240.0000.5070.006
number_of_motorist_injured-0.0320.0760.7800.008-0.089-0.0031.0000.0180.119-0.0000.1150.7830.0080.0040.0000.0000.0430.1020.0070.0030.0150.0050.0380.0080.0660.011
number_of_motorist_killed-0.0010.0060.0120.618-0.0040.0030.0181.0000.008-0.0060.0080.0120.6190.0010.0000.0310.0000.0000.0040.0010.0140.0000.0110.0040.4380.008
collision_id-0.0110.0640.1480.0110.0230.0040.1190.0081.000-0.0300.9910.1460.0110.0390.0040.2130.2630.3540.0160.1080.1520.0130.1140.0590.1310.305
crash_hour-0.010-0.0080.034-0.0040.034-0.000-0.000-0.006-0.0301.000-0.0310.034-0.0040.0230.0030.0650.0930.1150.0750.0160.0690.0290.0550.0270.0490.041
crash_year-0.0030.0550.1450.0110.0250.0040.1150.0080.991-0.0311.0000.1430.0110.0380.0050.1970.2060.2000.0110.0500.1010.0180.1140.0500.1330.285
total_injured-0.0260.0380.9990.0030.410-0.0040.7830.0120.1460.0340.1431.0000.0030.0040.0430.0000.0740.0900.0070.0030.0130.0050.0390.0080.0690.010
total_killed-0.0010.0030.0020.999-0.0020.7160.0080.6190.011-0.0040.0110.0031.0000.0050.7380.1130.0000.0000.0020.0000.0110.0020.0210.0020.7000.010
number_of_cyclist_injured0.0170.0370.0040.0050.0000.0020.0040.0010.0390.0230.0380.0040.0051.0000.0210.2090.0000.0000.0020.0260.0490.0010.0300.0320.2210.020
number_of_cyclist_killed0.0010.0020.0430.7370.1670.7070.0000.0000.0040.0030.0050.0430.7380.0211.0001.0001.0001.0000.0000.0020.0110.0000.0080.0020.2060.003
contributing_factor_vehicle_30.0940.0690.0000.1130.0000.0290.0000.0310.2130.0650.1970.0000.1130.2091.0001.0000.7380.7960.0230.0580.0000.0000.1320.1100.0780.389
contributing_factor_vehicle_40.1080.0660.0740.0000.6900.1760.0430.0000.2630.0930.2060.0740.0000.0001.0000.7381.0000.8070.0260.0840.0000.0000.0880.1290.0000.440
contributing_factor_vehicle_50.0860.0300.0900.0000.0001.0000.1020.0000.3540.1150.2000.0900.0000.0001.0000.7960.8071.0000.0820.1310.2640.1500.0560.1630.0000.473
crash_day0.0110.0130.0070.0020.0010.0010.0070.0040.0160.0750.0110.0070.0020.0020.0000.0230.0260.0821.0000.0120.4660.1980.0210.0120.0080.010
crash_month0.0060.0050.0030.0000.0000.0020.0030.0010.1080.0160.0500.0030.0000.0260.0020.0580.0840.1310.0121.0000.9940.1280.0120.0090.0170.025
holiday_name0.0300.0230.0130.0110.0050.0000.0150.0140.1520.0690.1010.0130.0110.0490.0110.0000.0000.2640.4660.9941.0001.0000.0320.0370.0530.049
is_public_holiday0.0130.0090.0050.0020.0050.0040.0050.0000.0130.0290.0180.0050.0020.0010.0000.0000.0000.1500.1980.1281.0001.0000.0110.0110.0040.000
Number_of_involved_Vehicles0.0370.0370.0390.0210.0120.0240.0380.0110.1140.0550.1140.0390.0210.0300.0080.1320.0880.0560.0210.0120.0320.0111.0000.0410.1370.102
BoroName0.6480.6860.0080.0020.0020.0000.0080.0040.0590.0270.0500.0080.0020.0320.0020.1100.1290.1630.0120.0090.0370.0110.0411.0000.0430.041
severity0.0430.0330.0690.7070.0280.5070.0660.4380.1310.0490.1330.0690.7000.2210.2060.0780.0000.0000.0080.0170.0530.0040.1370.0431.0000.058
location_type0.0660.0540.0100.0100.0030.0060.0110.0080.3050.0410.2850.0100.0100.0200.0030.3890.4400.4730.0100.0250.0490.0000.1020.0410.0581.000

Missing values

2025-04-26T02:06:33.665496image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
A simple visualization of nullity by column.
2025-04-26T02:06:39.636245image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-04-26T02:06:52.965985image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

crash_datecrash_timezip_codelatitudelongitudelocationon_street_namecross_street_nameoff_street_namenumber_of_persons_injurednumber_of_persons_killednumber_of_pedestrians_injurednumber_of_pedestrians_killednumber_of_cyclist_injurednumber_of_cyclist_killednumber_of_motorist_injurednumber_of_motorist_killedcontributing_factor_vehicle_1contributing_factor_vehicle_2contributing_factor_vehicle_3contributing_factor_vehicle_4contributing_factor_vehicle_5collision_idvehicle_type_code_1vehicle_type_code_2vehicle_type_code_3vehicle_type_code_4vehicle_type_code_5crash_hourcrash_daycrash_monthcrash_yearholiday_nameis_public_holidayNumber_of_involved_VehiclesgeometryBoroNametotal_injuredtotal_killedseveritylocation_type
22023-11-0101:29:001123040.621790-73.970024(40.62179, -73.970024)OCEAN PARKWAYAVENUE KNaN10000010NoneNoneNoneNoneNone4675373MopedSedanSedanNaNNaN1WednesdayNovember2023NaN03POINT (-73.97002 40.62179)Brooklyn2.00.0Injuryintersection
92021-09-1109:35:001120840.667202-73.866500(40.667202, -73.8665)NaNNaN1211 LORING AVENUE00000000NoneNoneNoneNoneNone4456314SedanNaNNaNNaNNaN9SaturdaySeptember2021NaN01POINT (-73.8665 40.6672)Brooklyn0.00.0No Casualtyoff_street
122021-12-1417:05:00NaN40.709183-73.956825(40.709183, -73.956825)BROOKLYN QUEENS EXPRESSWAYNaNNaN00000000passing too closelyNoneNoneNoneNone4486555SedanTractor Truck DieselNaNNaNNaN17TuesdayDecember2021NaN02POINT (-73.95682 40.70918)Brooklyn0.00.0No Casualtymid_block
132021-12-1408:17:001047540.868160-73.831480(40.86816, -73.83148)NaNNaN344 BAYCHESTER AVENUE20000020NoneNoneNoneNoneNone4486660SedanSedanNaNNaNNaN8TuesdayDecember2021NaN02POINT (-73.83148 40.86816)Bronx4.00.0Injuryoff_street
142021-12-1421:10:001120740.671720-73.897100(40.67172, -73.8971)NaNNaN2047 PITKIN AVENUE00000000driver inexperienceNoneNoneNoneNone4487074SedanNaNNaNNaNNaN21TuesdayDecember2021NaN01POINT (-73.8971 40.67172)Brooklyn0.00.0No Casualtyoff_street
152021-12-1414:58:001001740.751440-73.973970(40.75144, -73.97397)3 AVENUEEAST 43 STREETNaN00000000passing too closelyNoneNoneNoneNone4486519SedanStation Wagon/Sport Utility VehicleNaNNaNNaN14TuesdayDecember2021NaN02POINT (-73.97397 40.75144)Manhattan0.00.0No Casualtyintersection
162021-12-1300:34:00NaN40.701275-73.888870(40.701275, -73.88887)MYRTLE AVENUENaNNaN00000000passing or lane usage improperNoneNoneNoneNone4486934Station Wagon/Sport Utility VehicleNaNNaNNaNNaN0MondayDecember2021NaN01POINT (-73.88887 40.70128)Queens0.00.0No Casualtymid_block
172021-12-1416:50:001141340.675884-73.755770(40.675884, -73.75577)SPRINGFIELD BOULEVARDEAST GATE PLAZANaN00000000turning improperlyNoneNoneNoneNone4487127SedanStation Wagon/Sport Utility VehicleNaNNaNNaN16TuesdayDecember2021NaN02POINT (-73.75577 40.67588)Queens0.00.0No Casualtyintersection
192021-12-1400:59:00NaN40.596620-74.002310(40.59662, -74.00231)BELT PARKWAYNaNNaN00000000unsafe speedNoneNoneNoneNone4486564SedanNaNNaNNaNNaN0TuesdayDecember2021NaN01POINT (-74.00231 40.59662)Brooklyn0.00.0No Casualtymid_block
202021-12-1423:10:001143440.666840-73.789410(40.66684, -73.78941)NORTH CONDUIT AVENUE150 STREETNaN20000020reaction to uninvolved vehicleNoneNoneNoneNone4486635SedanSedanNaNNaNNaN23TuesdayDecember2021NaN02POINT (-73.78941 40.66684)Queens4.00.0Injuryintersection
crash_datecrash_timezip_codelatitudelongitudelocationon_street_namecross_street_nameoff_street_namenumber_of_persons_injurednumber_of_persons_killednumber_of_pedestrians_injurednumber_of_pedestrians_killednumber_of_cyclist_injurednumber_of_cyclist_killednumber_of_motorist_injurednumber_of_motorist_killedcontributing_factor_vehicle_1contributing_factor_vehicle_2contributing_factor_vehicle_3contributing_factor_vehicle_4contributing_factor_vehicle_5collision_idvehicle_type_code_1vehicle_type_code_2vehicle_type_code_3vehicle_type_code_4vehicle_type_code_5crash_hourcrash_daycrash_monthcrash_yearholiday_nameis_public_holidayNumber_of_involved_VehiclesgeometryBoroNametotal_injuredtotal_killedseveritylocation_type
21696752025-04-0919:08:001000340.736020-73.98227(40.73602, -73.98227)E 20 ST2 AVENaN10001000unsafe speedNoneNoneNoneNone4806318BikeBikeNaNNaNNaN19WednesdayApril2025NaN02POINT (-73.98227 40.73602)Manhattan2.00.0Injuryintersection
21696762025-04-1518:31:001141840.697830-73.83564(40.69783, -73.83564)113 STJAMAICA AVENaN00000000backing unsafelyNoneNoneNoneNone4806035SedanStation Wagon/Sport Utility VehicleNaNNaNNaN18TuesdayApril2025NaN02POINT (-73.83564 40.69783)Queens0.00.0No Casualtyintersection
21696772025-04-1515:52:001046140.854298-73.85492(40.854298, -73.85492)NaNNaN2007 WILLIAMSBRIDGE RD00000000driver inattention/distractionNoneNoneNoneNone4805948SedanSedanNaNNaNNaN15TuesdayApril2025NaN02POINT (-73.85492 40.8543)Bronx0.00.0No Casualtyoff_street
21696782025-04-1520:00:001136640.728012-73.78483(40.728012, -73.78483)UNION TPKE184 STNaN20000020traffic control disregardedNoneNoneNoneNone4806383Station Wagon/Sport Utility VehicleStation Wagon/Sport Utility VehicleNaNNaNNaN20TuesdayApril2025NaN02POINT (-73.78483 40.72801)Queens4.00.0Injuryintersection
21696792025-04-1514:30:001003640.757553-73.98551(40.757553, -73.98551)NaNNaN1516 BROADWAY10100000traffic control disregardedNoneNoneNoneNone4806096Station Wagon/Sport Utility VehicleNaNNaNNaNNaN14TuesdayApril2025NaN01POINT (-73.98551 40.75755)Manhattan2.00.0Injuryoff_street
21696802025-04-1523:20:001169140.610480-73.75028(40.61048, -73.75028)NaNNaN12-50 REDFERN AVE00000000view obstructed/limitedNoneNoneNoneNone4806081Station Wagon/Sport Utility VehicleNaNNaNNaNNaN23TuesdayApril2025NaN01POINT (-73.75028 40.61048)Queens0.00.0No Casualtyoff_street
21696812025-04-0708:50:001122140.695114-73.91186(40.695114, -73.91186)PUTNAM AVEKNICKERBOCKER AVENaN00000000backing unsafelyNoneNoneNoneNone4806432SedanNaNNaNNaNNaN8MondayApril2025NaN01POINT (-73.91186 40.69511)Brooklyn0.00.0No Casualtyintersection
21696822025-04-1505:58:00NaN40.761272-73.95571(40.761272, -73.95571)FDR DRIVENaNNaN20100010driver inattention/distractionNoneNoneNoneNone4806221Station Wagon/Sport Utility VehicleStation Wagon/Sport Utility VehicleNaNNaNNaN5TuesdayApril2025NaN02POINT (-73.95571 40.76127)Manhattan4.00.0Injurymid_block
21696842025-04-1421:25:001143640.675716-73.79124(40.675716, -73.79124)NaNNaN147-06 123 AVE00000000turning improperlyNoneNoneNoneNone4806294Station Wagon/Sport Utility VehicleNaNNaNNaNNaN21MondayApril2025NaN01POINT (-73.79124 40.67572)Queens0.00.0No Casualtyoff_street
21696862025-03-2313:00:001046240.836330-73.85505(40.83633, -73.85505)NaNNaN1502 OLMSTEAD AVE00000000NoneNoneNoneNoneNone4806253SedanNaNNaNNaNNaN13SundayMarch2025NaN01POINT (-73.85505 40.83633)Bronx0.00.0No Casualtyoff_street